Redundancy and Availability Features

Every minute of downtime and every dropped session represents lost revenue to the wireless operator resulting in potential customer loss and reduced profitability. With this understanding, we have developed a system that exceeds the availability features found in the majority of today's wireless and wireline access devices.

Service Availability Features

In its recommended redundant configuration, the system provides the highest level of service assurance. Following is detailed information describing the service availability features found in the system.

Hardware Redundancy Features

In addition to providing the highest transaction rates and session capacity, the system is designed to provide robust hardware reliability and service assurance features.

Features of the hardware design include:
  • 1:1 System Management Card (SMC) redundancy
  • 1:n packet processing card redundancy, allowing redundancy of multiple active to multiple redundant cards for up to 14 total packet processing cards. 1:1 redundancy is supported for these cards however some subscriber sessions and accounting information may be lost in the event of a hardware or software failure even though the system remains operational.
  • 1:1 Optical (ATM) line card (LC) redundancy (OLC and OLC2)
  • 1:1 Channelized (STM-1/OC-3) Line Card redundancy (CLC and CLC2)
  • 1:1 Quad Gigabit Ethernet Line Card (QGLC)
  • 1:1 10 Gigabit Ethernet Line Card (XGLC)
  • 1:1 Switch Processor I/O (SPIO) card redundancy
  • 1:1 Fast Ethernet Line Card (FELC)
  • 1:1 Gigabit Ethernet Line Cards (GELC, GLC2)
  • Configurable port redundancy (Ethernet, ATM (SGSN), and SPIO line cards)
  • Redundancy Crossbar Card (RCC) for processor-card-to-line card failover using the 280 Gbps Redundancy Bus
  • Self-healing redundant 320 Gbps switching fabric
  • Redundant 32 Gbps Control Bus
  • Redundant Power Filter Units (PFUs)
  • Hot-swappable cards, allowing dynamic replacement while the system is operational

Hardware Redundancy Configuration

The maximum redundant configuration for a fully loaded system supporting data services consists of the following:

  • Two SMCs: 1 active and 1 standby (redundant)
  • 14 processing cards: 13 active and 1 standby
  • Two SPIOs: 1 active and 1 standby
  • 26 half-height Ethernet line cards: 13 active and 13 standby (10/100 Ethernet Line Card (FELC), 1000 Gigabit Line Card (GELC, GLC2), and Quad Gigabit Ethernet Line Card (QGLC). these cards support top-bottom (vertical) redundancy.
  • Two full-height 10 Gigabit Ethernet Line Cards (XGLC): 1 active, 1 standby. The XGLC supports side-by-side (horizontal) redundancy.
  • 26 Optical (ATM) line cards: 13 active and 13 standby (OLC and OLC2)
  • 26 Channelized line cards: 13 active and 13 standby (CLC and CLC2)
  • Two RCCs: 2 standby
This configuration allows for the highest session capacity while still providing redundancy. The following figures depict this recommended maximum redundant configuration.
Figure 1. Recommended Redundant Configuration for Data Services - Front View

Figure 2. Recommended Redundant Configuration for Data Services - Rear View

Software Assurance Features

Numerous features are built into the system software to ensure the continuation of service in the case of software process failures. SMC software controls the management contexts and overall system control, while processing card software controls the PPP sessions, AAA, and VPN processes.

Session Recovery Feature

This licensed software feature performs an automatic recovery of all fully established subscriber sessions should a session manager task failure occur.

With this feature enabled, there is no loss of session information as described in table above. Session recovery consists of the migration and recreation of control and data packet state information, subscriber session statistics, or session time parameters such as idle timer and others.

This feature is enabled/disabled on a chassis-wide basis and requires additional processing card hardware to ensure that enough reserve resources (memory, processing, etc.) are available to fully recover session in the event of a software or hardware failure.

Interchassis Session Recovery

The Interchassis Session Recovery (ICSR) feature provides the highest possible availability for continuous call processing without interrupting subscriber services. This is accomplished through the use of redundant chassis. The chassis are configured as primary and backup with one being active and one inactive. Both chassis are connected to the AAA server. When calls pass the checkpoint duration timer, checkpoint data is sent from the active chassis to the inactive chassis. If the active chassis handling the call traffic goes out of service, the inactive chassis transitions to the active state and continues processing the call traffic without interrupting the subscriber session.

The chassis determine which is active through a propriety TCP-based connection called a redundancy link. This link is used to exchange status messages between the primary and backup chassis and must be maintained for proper system operation. In the event the redundancy link goes out of service, interchassis session recovery (ICSR) is maintained through the use of authentication probes and BGP peer monitoring. BGP routing must be enabled.

Mean Time Between Failure

Mean Time Between Failure (MTBF) data is used to provide statistical information as to the length of time that should expire before a particular card or system fails. This information is calculated using the following method:

Calculated MTBF - Expected elapsed time before failure occurs using the method defined in Telcordia TR-NWT-000332-CORE. This is based on reliability of components and design factors.

Failure per million hours (Fpmh) identifies the predicted failure rate per one million hours (for every 1,000,000 hours of operation, “FITS number” of failures would be expected to occur) for a component of the system.

The following table shows the MTBF characteristics of each major component of the system.


Table 1. Mean Time Between Failure Statistics
Cisco PID Description MTBF (Hours) MTBF (Years) Fpmh (Failure per million hours)

ASR5000-CHSSYS-K9=

Chassis with Midplane

16,386,995

1869.38

0.061

ASR5K-SMC-K9=

System Management Card (SMC)

104,372

11.91

9.58

ASR5K-PSC-K9=

Packet Services Card (PSC)

102,294

11.68

9.78

ASR5K-PSC-32G-K9=

ASR5K-PSC-64G-K9=

ASR5K-PSC-16G-K9=

ASR5K-PPC-K9=

Packet Services Card (PSC2, PSC3, PSCA, PPC)

93,950

10.75

10.64

ASR5K-0110G-MM-K9=

ASR5K-0110G-SM-K9=

10 Gigabit Ethernet Line Card (XGLC)

247,720

28.28

4.04

ASR5K-041GE-SX-K9=

ASR5K-041GE-T-K9=

ASR5K-041GE-LX-K9=

Quad Gig-E Card (QGLC)

258,606

29.52

3.867

ASR5K-01OC3-SM-K9=

ATM/POS OC-3 SM IR-1 Card optical daughter card

214,492

1,419,581

48.6

73.4

4.66

0.70

ASR5K-SPIO-BNC-K9=

ASR5K-SPIO-3PN-K9=

ASR5K-SPS3-BNC-K9=

ASR5K-SPS3-3PN-K9=

Switch Processor I/O Card (SPIO)

333,999

38.13

2.99

ASR5K-RCC-K9=

Redundancy Crossbar Card (RCC)

555,862

63.46

1.79

ASR5K-01100E-K9=

Fast Ethernet Card (FELC)

495,886

56.61

2.01

ASR5K-011GE-SX-K9=

ASR5K-011GE-LX-K9=

ASR5K-011GE-T-K9=

Gigabit Ethernet Card (GELC)

396,715

45.29

2.52

ASR5K-011G2-SX-K9

ASR5K-011G2-LX-K9

ASR5K-011G2-T-K9

Gigabit Ethernet Card v2 (GLC2)

396,715

45.29

2.52

ASR5K-PFU=

Power Filter Unit (PFU)

967,118

110.40

1.03

ASR5K-FANT-LW=

Fan Tray Unit - Lower

70,517

8.05

19.51

ASR5K-FANT-UP=

Fan Blower Unit - Upper

120,178

13.72

18.72



System Availability

System-level Mean Time To Failure (MTTF), is the average interval of time that a component will operate before failing. Reliability information is based on the number of overall anticipated failures of the individual components, in conjunction with any redundancy schemes employed to minimize the impact of such failures.

The following table provides service availability calculations (based on reliability modeling) for the ASR 5000 platform.
Table 2. Hardware Platform Availability Information
Platform Operational Uptime Yearly Downtime MTTF
(%) (minutes) Hours Years
ASR 5000 99.999978 0.12 14,077,473 1605.91


One suggestion to help improve overall system availability is to institute an on-site spares program, wherein key components are housed locally with the deployed equipment. The following section defines a recommended spares program and quantities for the system.

Mean Time To Repair (MTTR) is the amount of time needed to repair a component, recover the system, or otherwise restore service after a failure. System availability calculations are based on the industry standard of four hours.

Spare Component Recommendations

This section provides a recommended quantity of spare parts to be used as part of a spare components program for the system. The information contained is for informational purposes only, and should only be used as a guideline for designing a spares program that meets your company's design, deployment, and availability goals.

It is recommended that your company either has fully-trained personnel available to effect the exchange of Field Replaceable Units (FRUs) within your network, or requests on-site or field engineering resources to perform such duties.

Based on industry-leading redundancy and failover features found in the system, the following minimum spare parts levels for any planned deployment are recommended.
Table 3. Recommended FRU Parts Sparing Quantities
Component Name Minimum number of spares For every “n” number of deployed components
ASR 5000 Chassis with Midplane 1 20
System Management Card (SMC) 1 10
Packet Services Cards (PSCx, PPC) 1 12
Quad Gig-E Line Card (QGLC) 1 20
10 Gigabit Ethernet Line Card (XGLC) 1 20
Optical Line Card (OLC or OLC2) 1 20
Channelized Line Card (CLC or CLC2) 1 20
Switch Processor I/O Card (SPIO) 1 18
Redundancy Crossbar Card (RCC) 1 30
Fast Ethernet Line Card (FELC) 1 25
Gigabit Ethernet Line Card (GELC or GLC2) 1 25
Power Filter Unit (PFU) 1 30
Upper Fan Tray Unit 1 8
Lower Fan Tray Unit 1 5
Particulate Air Filter 1 1