The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
You can configure the high availability (HA) software framework and redundancy features using CLI. These features include application restartability and nondisruptive supervisor switchability. Cisco high availability is a technology delivered in Cisco NX-OS software that enables network-wide resilience to increase IP network availability.
The Cisco MDS 9500 Series of multilayer directors and switches support application restartability and nondisruptive supervisor switchability. The switches are protected from system failure by redundant hardware components and a high availability software framework.
The high availability (HA) software framework enables the following features:
Ensures Internal Cyclic Redundancy Check (CRC) detection and isolation on the Cisco MDS 9700 series switches.
Directors in the Cisco MDS 9500 Series have two supervisor modules (Supervisor-1 and Supervisor-2) in slots 5 and 6 (Cisco MDS 9509 and 9506 Switches) or slots 7 and 8 (Cisco MDS 9513 Switch). When the switch powers up and both supervisor modules are present, the supervisor module that comes up first enters the active mode, and the supervisor module that comes up second enters the standby mode. If both supervisor modules come up at the same time, Supervisor-1 becomes active. The standby supervisor module constantly monitors the active supervisor module. If the active supervisor module fails, the standby supervisor module takes over without any impact to user traffic.
Note | For high availability, you need to connect the Ethernet port for both active and standby supervisors to the same network or virtual LAN. The active supervisor owns the one IP address used by these Ethernet connections. On a switchover, the newly activated supervisor takes over this IP address. |
Beginning with the Cisco MDS NX-OS Release 6.2(13), the Internal Cyclic Redundancy Check (CRC) detection and isolation functionality is supported on the Cisco MDS 9700 series switches.
This functionality enables the Cisco MDS switches to detect CRC errors that occur internally within a switch and isolate the source of these errors.
Note | Internal CRC Detection and Isolation is supported only on the Cisco MDS 9700 Series Multilayer Directors. |
By default, the internal CRC detection and isolation is disabled.
The modules that support this functionality are:
Note | Module refers either a switching module or a supervisor module. |
These errors are a separate class of CRC errors when compared to frames that arrive from outside the switch, with CRC errors. In store mode and forward mode, frames with CRC errors are dropped at the ingress port and do not propagate through the system. Internal CRC errors occur when frames are received without errors, but get corrupted when they pass through the switching path.
Internal CRC errors are usually caused by a fault in the system. Such faults may be transient, such as an ungracefully removed module, or permanent, such as a badly seated module, or, in rare cases, a failing or failed hardware component. The rate of errors depends on many factors and may range from very high to very low.
The error-rate threshold is configurable as a system-wide value, but separate error counts are maintained for each module to identify an error source.
Note | The counters are reset at 24 hours from the time the feature, the Internal Cyclic Redundancy Check (CRC) detection and isolation was first configured. |
The five possible stages at which internal CRC errors may occur in a switch:
Stage 1—Ingress Buffer of a Module
Stage 2—Ingress Crossbar of a Module
Stage 3—Egress Crossbar of a Module
Stage 4—Egress crossbar of a module
Stage 5—Egress Buffer of a Module
Errors on each module are handled individually when the error count exceeds the threshold.
Note | A total of errors on all applicable ASICs on the module must exceed the threshold. |
When errors cross the specified threshold, XBAR_MONITOR_INTERNAL_CRC_ERR is the syslog message that is logged. This syslog message specifies the location of the error and the type of action taken.
Example: Error Messages
switch# show logging logfile | inc MONITOR_INTERNAL_CRC_ERR 2015 May 25 21:20:41 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Module-1 detects CRC Error:4 at Egress Q-engine, putting it in failure state 2015 May 25 21:15:35 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Fab_slot-12 detects CRC error:1 at ingress stage2, putting it in failure state 2015 May 25 15:47:10 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Module-5 detects CRC error:2 at Ingress Qengine, Only one Sup is present, bringing down the active VSAN 2015 May 25 15:08:17 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Module-5 detects CRC error:1 at Ingress Qengine, putting it in failure state
Stage 1—Ingress Buffer of a Module
There are multiple ingress buffers on each module. When the CRC error rate of an ingress buffer on a switching module reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
Stage 2—Ingress Crossbar of a Module
Ingress crossbar is an ASIC complex on an ingress module that switches traffic from ingress buffers to fabric modules. When the CRC error rate of an ingress switching module crossbar reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
Stage 3—Crossbar of a Fabric Module
Crossbar is an ASIC complex on a fabric module that switches traffic from an ingress module to an egress module.
When the CRC error rate of a crossbar reaches the threshold, if there is more than one fabric module in the corresponding switch, the host fabric module is shut down. If the switch has only one fabric module, the module connected to the fabric module link on which the errors occurred is shut down.
Stage 4—Egress Crossbar of a Module
Egress crossbar is an ASIC complex on an egress module that switches traffic from fabric modules to egress buffers. When the CRC error rate of an egress switching module crossbar reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
Stage 5—Egress Buffer of a Module
There are multiple egress buffers on each module. When the CRC error rate of an egress buffer on a switching module reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
The actions taken on a supervisor when the threshold is exceeded during the following stages of internal CRC detection and isolation:
Stage 1—Ingress Buffer of a Module
Stage 2—Ingress Crossbar of a Module
Stage 3—Egress Crossbar of a Module
Stage 5—Egress Buffer of a Module
For information on configuring the Internal CRC Detection and Isolation feature, see Configuring Internal CRC Detection and Isolation.