Table Of Contents
Understanding System-Level High Availability
Information About Cisco NX-OS System-Level High Availability
Supervisor Restarts and Switchovers
Restarts on Single Supervisors
Switchovers on Dual Supervisors
Manually Initiating a Switchover
Verifying Switchover Possibilities
Replacing the Active Supervisor Module in a Dual Supervisor System
Replacing the Standby Supervisor Module in a Dual Supervisor System
Displaying HA Status Information
Understanding System-Level High Availability
This chapter describes the Cisco NX-OS HA system and application restart operations and includes the following sections:
•Information About Cisco NX-OS System-Level High Availability
•Supervisor Restarts and Switchovers
•Displaying HA Status Information
Information About Cisco NX-OS System-Level High Availability
Cisco NX-OS system-level HA mitigates the impact of hardware or software failures and is supported by the following features:
•Redundant hardware components:
–Supervisor
–Switch fabric
–Power supply
–Fan trays
For details about physical requirements and redundant hardware components, respectively, see the Cisco Nexus 7000 Series Site Preparation Guide and the Cisco Nexus 7000 Series Hardware Installation and Reference Guide.
•HA software features:
–In-service software upgrades (ISSU) — For details about configuring and performing nondisruptive upgrades, see Chapter 5, "Understanding In-Service Software Upgrades."
–Nonstop forwarding (NSF) — For details about nonstop forwarding, also known as graceful restart, see the Cisco Nexus 7000 Series NX-OS Unicast Routing Configuration Guide, Release 4.1.
–Virtual device contexts (VDCs) — For details about VDCs and HA, see the Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide, Release 4.1.
–Generic online diagnostics (GOLD) — For details about configuring GOLD, see the Cisco Nexus 7000 Series NX-OS System Management Configuration Guide, Release 4.1.
–Embedded event manager (EEM) — For details about configuring EEM, see the Cisco Nexus 7000 Series NX-OS System Management Configuration Guide, Release 4.1.
–Smart Call Home — For details about configuring Smart Call Home, see the Cisco Nexus 7000 Series NX-OS System Management Configuration Guide, Release 4.1.
Virtualization Support
For information about system-level high availability within a virtual device context (VDC), see the "VDC High Availability" section.
Note For complete information on VDCs, see the Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide, Release 4.1.
Licensing Requirements
The following table shows the licensing requirements for system-level high availability features:
For a complete explanation of the Cisco NX-OS licensing scheme and how to obtain and apply licenses, see the Cisco Nexus 7000 Series NX-OS Licensing Guide, Release 4.1.
Physical Redundancy
The Nexus 7000 series includes the following physical redundancies:
For additional details about physical redundancies, see the Cisco Nexus 7000 Series Site Preparation Guide and the Cisco Nexus 7000 Series Hardware Installation and Reference Guide.
Power Supply Redundancy
The Nexus 7000 series supports up to three power supply modules on a Cisco Nexus 7010 switch and up to four power supplies on a Cisco Nexus 7018 switch. Each power supply module can deliver up to 7.5 KW, depending on the number of inputs and the input line voltage. By installing two or three modules, you can ensure that the failure of one module will not disrupt system operations. You can replace the failed module while the system is operating. For information on power supply module installation and replacement, see the Cisco Nexus 7000 Series Hardware Installation and Reference Guide.
For further redundancy, each power supply module includes two internalized isolated power units, which give it two power paths per modular power supply, and six paths in total, per chassis, when fully populated. In addition, the power subsystem allows the three power supplies to be configured in any one of four redundancy modes.
Power Modes
Each of the four available power redundancy modes imposes different power budgeting and allocation models, which in turn deliver varying usable power yields and capacities. For more information regarding power budgeting, usable capacity, planning requirements, and redundancy configuration, see the Cisco Nexus 7000 Series Hardware Installation and Reference Guide.
The available power supply redundancy modes are described in Table 4-1.
Fan Tray Redundancy
The Cisco Nexus 7000 series chassis contains two redundant system fan trays for I/O module cooling and two additional fan trays for switch fabric module cooling. Only one of each pair of fan trays is required to provide system cooling.
The fan speeds are variable and are automatically adjusted to one of 16 levels in order to optimize system cooling while minimizing overall system noise and power draw. A detected failure of a fan within a given fan tray will trigger an increase in the speed of the remaining fans to compensate for the failure. A detected removal of an entire fan tray, without replacement, will initiate a system shutdown after a three-minute warning period.
Starting with Cisco NX-OS Release 5.0(2a), the fan shutdown policy for the 10-slot chassis is as follows:
•If a system fan is removed: Earlier releases shut off the other fan in 3 minutes. The new policy is to increase the speed of the other fan based on the table mapping.
•If a fabric fan is removed: Earlier releases shut off the other fan in 3 minutes. The new policy is to increase the speed of the other fan to the maximum.
Caution In the case of a fan tray failure, you must leave the failed unit in place to ensure proper airflow until a replacement is made available. The fan trays are hot swappable, but you must complete the removal and replacement within three minutes to avoid an automatic system shutdown.
Switch Fabric Redundancy
Cisco NX-OS provides switching fabric availability through redundant switch fabric module implementation. You can configure a single Nexus 7000 series with one to five switch fabric cards for capacity and redundancy. Each I/O module installed in the system automatically connects to and uses all functionally installed switch fabric modules. A failure of a switch fabric module triggers an automatic reallocation and balancing of traffic across the remaining active switch fabric modules. Replacing the failed fabric module reverses this process. After you insert the replacement fabric module and bring it online, traffic is again redistributed across all installed fabric modules and redundancy is restored.
Supervisor Module Redundancy
The Nexus 7000 series supports dual supervisor modules to provide 1+1 redundancy for the control and management plane. A dual supervisor configuration operates in an active or standby capacity in which only one of the supervisor modules is active at any given time, while the other acts as a standby backup. The state and configuration remain constantly synchronized between the two supervisor modules to provide statefu1 switchover in the event of a supervisor module failure.
Cisco NX-OS's Generic On-Line Diagnostics (GOLD) subsystem and additional monitoring processes on the supervisor trigger a stateful failover to the redundant supervisor when the processes detect unrecoverable critical failures, service restartability errors, kernel errors, or hardware failures.
If a supervisor-level unrecoverable failure occurs, the currently active, failed supervisor triggers a switchover. The standby supervisor becomes the new active supervisor and uses the synchronized state and configuration while the failed supervisor is reloaded. If the failed supervisor is able to reload and pass self-diagnostics, it initializes, becomes the new standby supervisor, and then synchronizes its operating state with the newly active unit.
For additional details on supervisor switchovers, see the "Supervisor Restarts and Switchovers" section.
Supervisor Restarts and Switchovers
This section includes the following topics:
•Restarts on Single Supervisors
•Switchovers on Dual Supervisors
•Replacing the Active Supervisor Module in a Dual Supervisor System
•Replacing the Standby Supervisor Module in a Dual Supervisor System
Restarts on Single Supervisors
In a system with only one supervisor, when all HA policies have been unsuccessful in restarting a service, the supervisor restarts. The supervisor and all services reset and start with no prior state information.
Restarts on Dual Supervisors
When a supervisor-level failure occurs in a system with dual supervisors, the System Manager will perform a switchover rather than a restart to maintain stateful operation. In some cases, however, a switchover may not be possible at the time of the failure. For example, if the standby supervisor module is not in a stable standby state, a restart rather than a switchover is performed.
Switchovers on Dual Supervisors
A dual supervisor configuration allows nonstop forwarding (NSF) with stateful switchover (SSO) when a supervisor-level failure occurs. The two supervisors operate in an active/standby capacity in which only one of the supervisor modules is active at any given time, while the other acts as a standby backup. The two supervisors constantly synchronize the state and configuration in order to provide a seamless and stateful switchover of most services if the active supervisor module fails.
Switchover Characteristics
An HA switchover has the following characteristics:
•It is stateful (nondisruptive) because control traffic is not affected.
•It does not disrupt data traffic because the switching modules are not affected.
•Switching modules are not reset.
•It does not reload the Connectivity Management Processor (CMP).
Switchover Mechanisms
Switchovers occur by one of the following two mechanisms:
•The active supervisor module fails and the standby supervisor module automatically takes over.
•You manually initiate a switchover from an active supervisor module to a standby supervisor module.
When a switchover process begins, another switchover process cannot be started on the same switch until a stable standby supervisor module is available.
Switchover Failures
If a switchover does not complete successfully within 28 seconds, the supervisors will reset. A reset prevents loops in the Layer 2 network if the network topology was changed during the switchover. For optimal performance of this recovery function, we recommend that you do not change the Spanning Tree Protocol (STP) default timers.
If three system-initiated switchovers occur within 20 minutes, all nonsupervisor modules will shut down to prevent switchover cycling. The supervisors remain operational to allow you to collect system logs before resetting the switch.
Manually Initiating a Switchover
To manually initiate a switchover from an active supervisor module to a standby supervisor module, use the system switchover command. After you run this command, you cannot start another switchover process on the same system until a stable standby supervisor module is available.
Note If the standby supervisor module is not in a stable state (ha-standby), a manually-initiated switchover is not performed.
To ensure that an HA switchover is possible, use the show system redundancy status command or the show module command. If the command output displays the ha-standby state for the standby supervisor module, you can manually initiate a switchover.
Switchover Guidelines
Follow these guidelines when performing a switchover:
•When you manually initiate a switchover, you will see system messages that indicate the presence of two supervisor modules.
•A switchover can only be performed when two supervisor modules are functioning in the switch.
•The modules in the chassis must be functioning.
Verifying Switchover Possibilities
This section describes how to verify the status of the switch and the modules before a switchover:
•Use the show system redundancy status command to ensure that the system is ready to accept a switchover. For information about the show system redundancy status command, see the "Displaying HA Status Information" section.
•Use the show module command to verify the status (and presence) of a module at any time. A sample output of the show module command follows:
switch# show moduleMod Ports Module-Type Model Status--- ----- -------------------------------- ------------------ ------------7 48 1000 Mbps Optical Ethernet Modul N7K-M148GS-11 ok10 0 Supervisor module-1X N7K-SUP1 active *12 48 10/100/1000 Mbps Ethernet Module NURBURGRING okMod Sw Hw--- -------------- ------7 4.1(4) 0.20210 4.1(4) 1.212 4.1(4) 0.407Mod MAC-Address(es) Serial-Num--- -------------------------------------- ----------7 00-1b-54-c2-ed-d0 to 00-1b-54-c2-ee-04 JAF1219AGFE10 00-24-98-6f-7b-e0 to 00-24-98-6f-7b-e8 JAF1307ALAT12 00-19-07-6c-4d-a8 to 00-19-07-6c-4d-dc JAB104400P0Mod Online Diag Status--- ------------------7 Pass10 Pass12 PassXbar Ports Module-Type Model Status--- ----- -------------------------------- ------------------ ------------1 0 Fabric Module 1 N7K-C7018-FAB-1 okXbar Sw Hw--- -------------- ------1 NA 0.101Xbar MAC-Address(es) Serial-Num--- -------------------------------------- ----------1 NA JAF1225AGHJ* this terminal sessionswitch#The Status column in the output should display an OK status for switching modules and an active or ha-standby status for supervisor modules.
•Use the show boot auto-copy command to verify the configuration of the auto-copy feature and if an auto-copy to the standby supervisor module is in progress. Sample outputs of the show boot auto-copy command are as follows:
switch# show boot auto-copyAuto-copy feature is enabledswitch# show boot auto-copy listNo file currently being auto-copiedReplacing the Active Supervisor Module in a Dual Supervisor System
You can nondisruptively replace the active supervisor module in a dual supervisor system.
To replace the active supervisor module, follow these steps:
Step 1 Initiate a manual switchover to the standby supervisor.
switch# system switchoverRaw time read from Hardware Clock: Y=2009 M=2 D=2 07:35:48writing reset reason 7,NX7 SUP Ver 3.17.0Serial Port Parameters from CMOSPMCON_1: 0x200PMCON_2: 0x0PMCON_3: 0x3aPM1_STS: 0x1Performing Memory Detection and TestingTesting 1 DRAM PatternsTotal mem found : 4096 MBMemory test complete.NumCpus = 2.Status 61: PCI DEVICES Enumeration StartedStatus 62: PCI DEVICES Enumeration EndedStatus 9F: Dispatching DriversStatus 9E: IOFPGA FoundStatus 9A: Booting From Primary ROMStatus 98: Found Cisco IDEStatus 98: Found Cisco IDEStatus 90: Loading Boot LoaderReset Reason Registers: 0x1 0x10Filesystem type is ext2fs, partition type 0x83Filesystem type is ext2fs, partition type 0x83GNU GRUB version 0.97Loader Version 3.17.0current standby sup----------------------------switch(standby)# 2009 Feb 2 07:35:46 switch %$ VDC-1 %$ %KERN-2-SYSTEM_MSG: Switchover started by redundancy driver - kernel2009 Feb 2 07:35:47 switch %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).2009 Feb 2 07:35:47 switch %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: This supervisor is becoming active.2009 Feb 2 07:35:48 switch %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.
Note Wait until the the switchover completes and the standby supervisor becomes active.
Step 2 Power down the supervisor module you are replacing.
switch# out-of-service module 6Step 3 Replace the supervisor module. For information on replacing a supervisor module, see the Cisco Nexus 7000 Series Hardware Installation and Reference Guide.
Note The replacement supervisor module should boot from the active supervisor module after six minutes. Use the reload module slot-number force command to force the boot.
Step 4 Copy the kickstart image from the active supervisor module to the standby supervisor module.
switch# copy bootflash:n7000-s1-kickstart.4.1.2.gbin.S30 bootflash://sup-remote/n7000-s1-kickstart.4.1.2.gbin.S30Step 5 Copy the system image from the active supervisor module to the standby supervisor module.
switch# copy bootflash:n7000-s1-dk9.4.1.2.gbin.S30 bootflash://sup-remote/n7000-s1-dk9.4.1.2.gbin.S30Step 6 Configure the standby supervisor boot variables.
switch# config tswitch# boot kickstart bootflash://sup-remote/n7000-s1-kickstart.4.1.2.gbin.S30
switch# boot system bootflash://sup-remote/n7000-s1-dk9.4.1.2.gbin.S30
Step 7 Save these changes to the startup configuration.
switch# copy running-config startup-config
Replacing the Standby Supervisor Module in a Dual Supervisor System
You can nondisruptively replace standby supervisor module in a dual supervisor system.
To replace the standby supervisor module, follow these steps:
Step 1 Power down the standby supervisor module.
switch# out-of-service module 6Step 2 Replace the supervisor module. For information on replacing a supervisor module, see the Cisco Nexus 7000 Series Hardware Installation and Reference Guide.
Note The replacement supervisor module should boot from the active supervisor module after six minutes. Use the reload module slot-number force command to force the boot.
Step 3 Copy the kickstart image from the active supervisor module to the standby supervisor module.
switch# copy bootflash:n7000-s1-kickstart.4.1.2.bin bootflash://sup-remote/n7000-s1-kickstart.4.1.2.binStep 4 Copy the system image from the active supervisor module to the standby supervisor module.
switch# copy bootflash:n7000-s1-dk9.4.1.2.bin bootflash://sup-remote/n7000-s1-dk9.4.1.2.binStep 5 Configure the standby supervisor boot variables.
switch# config tswitch# boot kickstart bootflash://sup-remote/n7000-s1-kickstart.4.1.2.bin
switch# boot system bootflash://sup-remote/n7000-s1-dk9.4.1.2.bin
Step 6 Save these changes to the startup configuration.
switch# copy running-config startup-config
Displaying HA Status Information
Use the show system redundancy status command to view the HA status of the system. Tables 4-2 to 4-4 explain the possible output values for the redundancy, supervisor, and internal states.
switch# show system redundancy statusRedundancy mode---------------administrative: HAoperational: HAThis supervisor (sup-1)-----------------------Redundancy state: ActiveSupervisor state: ActiveInternal state: Active with HA standbyOther supervisor (sup-2)------------------------Redundancy state: StandbySupervisor state: HA standbyInternal state: HA standbyThe following conditions identify when automatic synchronization is possible:
•If the internal state of one supervisor module is Active with HA standby and the other supervisor module is ha-standby, the system is operationally HA and can perform automatic synchronization.
•If the internal state of one of the supervisor modules is none, the system cannot perform automatic synchronization.
Table 4-2 lists the possible values for the redundancy states.
Table 4-3 lists the possible values for the supervisor module states.
Table 4-4 lists the possible values for the internal redundancy states.
Table 4-4 Internal States
State DescriptionHA standby
The HA switchover mechanism in the standby supervisor module is enabled (see the "Manually Initiating a Switchover" section).
Active with no standby
A switchover is possible.
Active with HA standby
The active supervisor module in the switch is ready to be configured. The standby supervisor module is in the ha-standby state.
Shutting down
The system is being shut down.
HA switchover in progress
The system is in the process of changing over to the HA switchover mechanism.
Offline
The system is intentionally shut down for debugging purposes.
HA synchronization in progress
The standby supervisor module is in the process of synchronizing its state with the active supervisor modules.
Standby (failed)
The standby supervisor module is not functioning.
Active with failed standby
The active supervisor module and the second supervisor module are present but the second supervisor module is not functioning.
Other
The system is in a transient state. If it persists, call TAC.
VDC High Availability
The Cisco NX-OS software incorporates high availability (HA) features that minimize the impact on the data plane if the control plane fails or a switchover occurs. The different HA service levels provide data plane protection, including service restarts, stateful supervisor module switchovers, and in-service software upgrades (ISSUs). All of these high availability features support VDCs.
If unrecoverable errors occur in a VDC, the Cisco NX-OS software provides HA policies that you can specify for each VDC. These HA policies include the following:
•Bringdown—Puts the VDC in the failed state. To recover from the failed state, you must reload the physical device.
•Reset— Initiates a supervisor module switchover for a Cisco NX-OS device with two supervisor modules, or reloads a Cisco NX-OS device with one supervisor module.
•Restart—Deletes the VDC and recreates it by using the startup configuration.
For details about VDCs and HA, see the Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide, Release 4.1.
Additional References
For additional information related to implementing system-level HA features, see the following sections:
•MIBs
•RFCs
Related Documents
Standards
Standards TitleNo new or modified standards are supported by this feature, and support for existing standards has not been modified by this feature.
—
MIBs
MIBs MIBs Link•CISCO-SYSTEM-EXT-MIB: ciscoHaGroup, cseSwCoresTable, cseHaRestartNotify, cseShutDownNotify, cseFailSwCoreNotify, cseFailSwCoreNotifyExtended
•CISCO-PROCESS-MIB
•CISCO-RF-MIB
To locate and download MIBs, go to the following URL:
http://www.cisco.com/public/sw-center/netmgmt/cmtk/mibs.shtml
RFCs
Technical Assistance