Cisco ASR 9000 Series Aggregation Services Router Overview and Reference Guide
High Availability and Redundant Operation
Downloads: This chapterpdf (PDF - 967.0KB) The complete bookPDF (PDF - 14.55MB) | Feedback

Table of Contents

High Availability and Redundant Operation

Features Overview

High Availability Router Operations

Stateful Switchover

Fabric Switchover

Active/Standby Status Interpretation

Non-Stop Forwarding

Nonstop Routing

Graceful Restart

Process Restartability

Fault Detection and Management

Power Supply Redundancy

AC Power Redundancy

Cisco ASR 9010 AC Power Redundancy

Cisco ASR 9006 AC Power Redundancy

Cisco ASR 9904 AC Power Redundancy

Cisco ASR 9922 AC Power Redundancy

Cisco ASR 9912 AC Power Redundancy

DC Power Redundancy

Cisco ASR 9010 DC Power Redundancy

Cisco ASR 9006 DC Power Redundancy

Cisco ASR 9904 DC Power Redundancy

Cisco ASR 9922 DC Power Redundancy

Cisco ASR 9912 DC Power Redundancy

Detection and Reporting of Power Problems

Cooling System Redundancy

Cooling Failure Alarm

High Availability and Redundant Operation

This chapter describes the high availability and redundancy features of the Cisco ASR 9000 Series Routers.

Features Overview

The Cisco ASR 9000 Series Routers are designed to have high Mean Time Between Failures (MTBF) and low Mean Time To Resolve (MTTR) rates, thus providing a reliable platform that minimizes outages or downtime and maximizes availability.

In addition, the Cisco ASR 9000 Series Routers offer the following high availability (HA) features to enhance network level resiliency and enable network-wide protection:

High Availability Router Operations

Stateful Switchover

The RSP/RP cards are deployed in “active/standby” configurations. Stateful switchover (SSO) preserves state and configuration information if a switchover to the standby RSP/RP card occurs. The standby RSP/RP card has a mirror image of the state of protocols, users configuration, interface state, subscriber state, system state and other parameters. Should a hardware or software failure occur in the active RSP/RP card, the standby RSP/RP card changes state to become the active RSP/RP card. This stateful switchover has no impact in forwarding traffic.

Fabric Switchover

  • In the Cisco ASR 9010 Router, Cisco ASR 9006 Router, and Cisco ASR 9904 Router, the RSP card makes up most of the fabric. The fabric is configured in an “active/active” configuration model, which allows the traffic load to be distributed across both RSP cards. In the case of a failure, the single “active” switch fabric continues to forward traffic in the systems.
  • In the Cisco ASR 9922 Router and Cisco ASR 9912 Router, fabric switching across the RP and line cards is provided by a separate set of seven OIR FC cards operating in 6+1 redundancy mode. Any FC card can be removed from the chassis, power-cycled, or provisioned to remain unpowered without impacting system traffic. All FC cards remain active unless disabled or faulty. Traffic from the line cards is distributed across all FC cards.

Active/Standby Status Interpretation

Status signals from each RSP/RP card are monitored to determine active/standby status and if a failure has occurred that requires a switchover from one RSP/RP card to the other.

Non-Stop Forwarding

Cisco IOS XR Software supports non-stop forwarding (NSF) to enable the forwarding of packets without traffic loss during a brief outage of the control plane. NSF is implemented through signaling and routing protocol implementations for graceful restart extensions as standardized by the Internet Engineering Task Force (IETF).

For example, a soft reboot of certain software modules does not hinder network processors, the switch fabric, or the physical interface operation of forwarding packets. Similarly, a soft reset of a non-data path device (such as a Ethernet Out-of-Band Channel Gigabit Ethernet switch) does not impact the forwarding of packets.

Nonstop Routing

Nonstop routing (NSR) allows forwarding of data packets to continue along known routes while the routing protocol information is being refreshed following a processor switchover. NSR maintains protocol sessions and state information across SSO functions for services such as MPLS VPN. TCP connections and the routing protocol sessions are migrated from the active RSP/RP card to the standby RSP/RP card after the RSP/RP switchover without letting peers know about the switchover. The sessions terminate and the protocols running on the standby RSP/RP card reestablish the sessions after the standby RSP/RP goes active. NSR can also be used with graceful restart to protect the routing control plane during switchovers. The NSR functionality is available only for Open Shortest Path First Protocol (OSPF) and Label Distribution Protocol (LDP) routing technologies.

Graceful Restart

Graceful restart (GR) provides a control plane mechanism to ensure high availability by allowing detection and recovery from failure conditions while preserving Nonstop Forwarding (NSF) services. Graceful restart is a way to recover from signaling and control plane failures without impacting the forwarding plane. Cisco IOS XR Software uses graceful restart and a combination of check pointing, mirroring, route switch processor redundancy, and other system resiliency features to recover before a timeout and avoid service downtime as a result of network reconvergence.

Process Restartability

The Cisco IOS XR distributed and modular microkernel operating system enables process independence, restartability, and maintenance of memory and operational states. Each process runs in a protected address space. Checkpointing facilities, reliable transports, and retransmission features enable processes to be restarted without impacting other components and with minimal or no disruption of traffic. Usually any time a process fails, crashes or incurs any faults, the process restarts itself. For example, if a Border Gateway Protocol (BGP) or Quality of Service (QoS) process incurs a fault, it restarts to resume its normal routine without impacting other processes.

Fault Detection and Management

To minimize service outage, the Cisco ASR 9000 Series Routers provide rapid and efficient response to single or multiple system component or network failures When local fault handling cannot recover from critical faults, the system offers robust fault detection, correction, failover, and event management capabilities.

  • Fault detection and correction—In hardware, the Cisco ASR 9000 Series Routers offer error correcting code (ECC)-protected memory. If a memory corruption occurs, the system automatically restarts the impacted processes to fix the problem with minimum impact. If the problem is persistent, the Cisco ASR 9000 supports switchover and online insertion and removal (OIR) capabilities to allow replacement of defective hardware without impacting services on other hardware components in the system.
  • Resource management—Cisco ASR 9000 Series Routers support resource threshold monitoring for CPU and memory utilization to improve out of resource (OOR) management. When threshold conditions are met or exceeded, the system generates an OOR alarm to notify operators of OOR conditions. The system then automatically attempts recovery, and allows the operator to configure flexible policies using the embedded event manager.
  • Online diagnostics—Cisco ASR 9000 Series Routers provide built-in online diagnostics to monitor functions such as network path failure detection, packet diversion failures, faulty fabric link detections, etc. The tests are configurable through the CLI.
  • Event management—Cisco ASR 9000 Series Routers offer mechanisms such as fault-injection testing to detect hardware faults during lab testing, a system watchdog mechanism to recover failed processes, and tools such as the Route Consistency Checker to diagnose inconsistencies between the routing and forwarding tables.

Power Supply Redundancy

The Cisco ASR 9000 Series Routers are configured such that a power module failure or its subsequent replacement does not cause a significant outage. When a power supply failure or over/under voltage at the output of a power module is detected, an alarm raised.

AC Power Redundancy

The AC power modules are a modular design allowing replacement without any outage. At least one fully loaded AC tray is required to power a fully loaded system. The slot location of a module in the tray is irrelevant as long as there are an equal number of modules (in case one tray fails).


Note AC power redundancy for the Cisco ASR 9010 Router, Cisco ASR 9922 Router, and Cisco ASR 9912 Router requires that power modules be installed in multiple power trays.


Cisco ASR 9010 AC Power Redundancy

The Cisco ASR 9010 Router supports the version 1, version 2, and version 3 power systems.

Figure 3-1 shows the AC power module configuration for the version 1 power system. Figure 3-2 shows the AC power module configuration for the version 2 power system. Figure 3-3 show the AC power module configuration for the version 3 power system.

Figure 3-1 AC System Power Redundancy for the Cisco ASR 9010 Router—Version 1

 

Figure 3-2 AC System Power Redundancy for the Cisco ASR 9010 Router—Version 2

 

Figure 3-3 AC System Power Redundancy for the Cisco ASR 9010 Router—Version 3

 

Cisco ASR 9006 AC Power Redundancy

The Cisco ASR 9006 router supports the version 1 and version 2 power system. Figure 3-4 shows an example of the AC power module configuration for the version 2 power system.

Figure 3-4 AC System Power Redundancy for the Cisco ASR 9006 Router—Version 2

 

Cisco ASR 9904 AC Power Redundancy

The Cisco ASR 9904 router supports the version 2 power system. Figure 3-5 shows the AC power module configuration for version 2 power system.

Figure 3-5 TAC System Power Redundancy for the Cisco ASR 9904 Router—Version 2

 

Cisco ASR 9922 AC Power Redundancy

The Cisco ASR 9922 router supports the version 2 and version 3 power systems. Figure 3-6 shows the AC power module configuration for the version 2 power system. Figure 3-7 shows the AC power module configurations for the version 3 power system.

Figure 3-6 AC System Power Redundancy for the Cisco ASR 9922 Router—Version 2

 

Figure 3-7 AC System Power Redundancy for the Cisco ASR 9922 Router—Version 3

 

Cisco ASR 9912 AC Power Redundancy

The Cisco ASR 9912 router supports the version 2 and version 3 power systems. Figure 3-8 shows the AC power module configuration for the version 2 power system. Figure 3-9 shows the AC power module configuration for the version 3 power system.

Figure 3-8 AC System Power Redundancy for the Cisco ASR 9912 Router—Version 2

 

Figure 3-9 AC System Power Redundancy for the Cisco ASR 9912 Router—Version 3

 

DC Power Redundancy

The DC power modules are a modular design allowing replacement without any outage. Each tray houses up to three version 1 power modules or four version 2 power modules.

The Cisco ASR 9000 Series Routers have two available DC power modules, a 2100 W module and a 1500 W module. Both types of power modules can be used in a single chassis. See Appendix A, “Technical Specifications,” for power module specifications. The slot location of a module in a tray is irrelevant as long as there are N+1 number of modules.

Redundant –48 VDC power feeds are separately routed to each power tray. For maximum diversity, the power entry point to each tray is spatially separated to the left and right edges of the tray. Each feed can support the power consumed by the entire module in version 1 or version 2 modules. There is load sharing between the feeds. Each power module in the tray uses either feed for power, enabling maintenance or replacement of a power feed without causing interruption.

Cisco ASR 9010 DC Power Redundancy

The Cisco ASR 9010 router supports the version 1, version 2, and version 3 power systems.

Figure 3-10 shows the DC power module configuration for the version 1 power system. Figure 3-11 shows the DC power module configuration for the version 2 power system. Figure 3-12 shows the DC power module configuration for the version 3 power system.

Figure 3-10 DC System Power Redundancy for the Cisco ASR 9010 Router—Version 1

 

 

Figure 3-11 DC System Power Redundancy for the Cisco ASR 9010 Router—Version 2

 

Figure 3-12 DC System Power Redundancy for the Cisco ASR 9010 Router—Version 3

 

Cisco ASR 9006 DC Power Redundancy

The Cisco ASR 9006 router supports the version 1 and version 2 power systems. Figure 3-13 shows an example of the DC power module configuration for the version 2 power system.

Figure 3-13 DC System Power Redundancy for the Cisco ASR 9006 Router—Version 2

 

Cisco ASR 9904 DC Power Redundancy

The Cisco ASR 9904 router supports the version 2 power system. Figure 3-14 shows the DC power module configuration for the version 2 power system.

Figure 3-14 DC System Power Redundancy for the Cisco ASR 9904 Router—Version 2

 

Cisco ASR 9922 DC Power Redundancy

The Cisco ASR 9922 router supports the version 2 and version 3 power systems.

Figure 3-15 shows the DC power module configuration for the version 2 power system. Figure 3-16 shows the DC power module configuration for the version 3 power system.

Figure 3-15 DC System Power Redundancy for the Cisco ASR 9922 Router—Version 2

 

Figure 3-16 DC System Power Redundancy for the Cisco ASR 9922 Router—Version 3

 

Cisco ASR 9912 DC Power Redundancy

The Cisco ASR 9912 router supports the version 2 and version 3 power systems.

Figure 3-17 shows the DC power module configuration for the version 2 power system. Figure 3-18 shows the DC power module configuration for the version 3 power system.

Figure 3-17 DC System Power Redundancy for the Cisco ASR 9912 Router—Version 2

 

Figure 3-18 DC System Power Redundancy for the Cisco ASR 9912 Router—Version 3

 


Note The Cisco ASR 9000 Series Routers are capable of operating with one power module. However, such a configuration does not provide any redundancy.


Detection and Reporting of Power Problems

All –48 VDC feed and return lines have fuses and are monitored. Any fuse blown can be detected and reported. The input voltages are monitored against an over and under voltage alarm threshold.

Cooling System Redundancy

The Cisco ASR 9000 Series Routers are configured in such a way that a fan failure or its subsequent replacement does not cause a significant outage. During either a fan replacement or a fan failure, the airflow is maintained and no outage occurs. Also, the fan trays are hot swappable so that no outage occurs during replacement. For information on redundancy values for the Cisco ASR 9000 Series Routers, see Table 1-7.

Cooling Failure Alarm

Temperature sensors are installed in all cards and fan trays. These sensors detect and report any fan failure or high temperature condition, and raise an alarm. Fan failure can be a fan stopping, fan controller failure, power failure, or a failure of a communication link to the RSP/RP card.

Every card has temperature measurement points in the hottest expected area to clearly indicate a cooling failure. The line cards have two sensors, one at the inlet and one near the hottest devices on the card. The RSP/RP card also has two sensors.