The fabric modules of modular Cisco MDS platforms are commonly called Xbars. There are two versions of these fabric modules:
Fabric1 and Fabric 3. Frames that are received by an FC port with CRC error are dropped and not forwarded further. As frames
move from component to component, module to module, including through the fabric modules, errors may occur. Frame CRCs are
checked at several places along the switching path. Once a frame error is detected the frame is dropped as soon as possible.
The existing Internal CRC Detection and Isolation feature can detect and take corrective action when these internal CRC errors
occur. However, fabric modules can experience other errors that are not true CRC errors. The Fabric Module Error Monitoring
feature, introduced in Cisco NX-OS 9.3(1), complements the Internal CRC Detection and Isolation feature, and is designed to
detect and take corrective action in the presence of these errors. This feature allows automated monitoring and handling of
errors in Fabric 1 and Fabric 3 modules that might cause I/O problems in the fabric.
Fabric Module Error Monitoring is controlled by the xbarErrorMonitor CLI command. The command utilizes the MDS scheduler feature to check for the internal errors. It creates a scheduler job
named xbarErrorMonitor_job with an error checking script and a scheduler schedule named XbarErrorMonitor_Schedule . The scheduler periodically executes the script which collects show hardware internal errors information for the configurable set of counters on each fabric module. After the configurable sleep time, it collects the
show hardware internal errors information again and calculates the change of the counter values. If any counter deltas are equal or higher than the configured
threshold then the configured action is executed. The log-only action will log syslog messages. The module is left in service and continues to switch traffic. The log-and-out-of-service action will log the same syslog messages but additionally put the affected module out of service, immediately, stopping the
suspect device from affecting further traffic. This action provides real time operational remediation until the module can
be later inspected for the root cause. If there is only one fabric module left in service it will not be powered down.
Fabric Module Error Monitoring generates the following type of syslog message detailing the affected module, switching ASIC,
error counter, and its value:
%USER-2-SYSTEM_MSG: xbarErrorMonitor: counter threshold exceeded for xbar 1 for counter packets dropped destined to port.
(Before: 0, After: 128, Delta 128).
For information about xbarErrorMonitor command default values, refer to the Default Values section.
Contact the Cisco Technical Assistance Center (TAC) for diagnostic assistance and possible module replacement if internal
CRC errors are detected.
The following error counters can be monitored by Fabric Module Error Monitoring:
|
Module
|
Counter
|
Description
|
|
Fabric 1 Module
|
INTERNAL_ERROR_CNT
|
Errors related to fabric link, input and output buffer full, and timeout events on fabric module.
|
|
HIGH_XT_DROP_CNT
|
Packets dropped due to fabric module packet switching timeout.
|
|
SAC_XTIMEOUT_INTR_HI
|
Packet timeouts due to fabric module egress port buffer full.
|
|
HIGH_NULL_POE_DROP_CNT
|
Packets dropped with empty fabric module egress port address.
|
|
Fabric 3 Module
|
packets dropped destined to port
|
Packets dropped due to fabric module egress port buffer full.
|
|
packets drop on receive port
|
Packets dropped on fabric module ingress port.
|
|
double bit ecc error
|
Packets dropped due to double bit ECC error in fabric module port buffers.
|
|
null fpoe port
|
Packets dropped with empty egress port address.
|

Note
|
These counters can be displayed (if they are non-zero) using the show hardware internal errors command.
|