Cisco UCS C-Series Servers Integrated Management Controller Faults Reference Guide
Faults Generated in CIMC
Downloads: This chapterpdf (PDF - 453.0KB) The complete bookPDF (PDF - 761.0KB) | Feedback

Faults Generated in CIMC

Table Of Contents

Faults Generated in CIMC

Chassis-Related Faults

fltEquipmentChassisThermalThresholdCritical

fltEquipmentChassisThermalThresholdNonCritical

fltEquipmentChassisThermalThresholdNonRecoverable

Fan-Related Faults

fltEquipmentFanDegraded

fltEquipmentFanInoperable

fltEquipmentFanModuleMissing

fltEquipmentFanPerfThresholdNonCritical

fltEquipmentFanPerfThresholdCritical

fltEquipmentFanPerfThresholdNonRecoverable

fltEquipmentFanMissing

I/O Module-Related Faults

fltEquipmentIOCardRemoved

fltEquipmentIOCardThermalProblem

fltEquipmentIOCardThermalThresholdNonCritical

fltEquipmentIOCardThermalThresholdCritical

fltEquipmentIOCardThermalThresholdNonRecoverable

Memory-Related Faults

fltMemoryUnitDegraded

fltMemoryUnitInoperable

fltMemoryUnitThermalThresholdNonCritical

fltMemoryUnitThermalThresholdCritical

fltMemoryUnitThermalThresholdNonRecoverable

fltMemoryArrayVoltageThresholdCritical

fltMemoryArrayVoltageThresholdNonRecoverable

fltMemoryUnitIdentityUnestablishable

Processor-Related Faults

fltProcessorUnitInoperable

fltProcessorUnitThermalNonCritical

fltProcessorUnitThermalThresholdCritical

fltProcessorUnitThermalThresholdNonRecoverable

fltProcessorUnitDisabled

Power Supply-Related Faults

fltEquipmentPsuInoperable

fltEquipmentPsuThermalThresholdNonCritical

fltEquipmentPsuThermalThresholdCritical

fltEquipmentPsuMissing

fltEquipmentPsuThermalThresholdNonRecoverable

fltEquipmentPsuVoltageThresholdCritical

fltEquipmentPsuVoltageThresholdNonRecoverable

fltEquipmentPsuPerfThresholdNonCritical

fltEquipmentPsuPerfThresholdCritical

fltEquipmentPsuPerfThresholdNonRecoverable

fltEquipmentPsuIdentity

fltPowerChassisMemberChassisPsuRedundanceFailure

fltEquipmentPsuPowerThreshold

fltEquipmentPsuInputError

Server-Related Faults

fltComputeBoardPowerError

fltComputePhysicalBiosPostTimeout

fltComputeBoardCmosVoltageThresholdCritical

fltComputeBoardCmosVoltageThresholdNonRecoverable

fltComputeIOHubThermalNonCritical

fltComputeIOHubThermalThresholdCritical

fltComputeIOHubThermalThresholdNonRecoverable

fltComputePhysicalPostFailure

fltComputeBoardPowerFail

fltComputeBoardThermalProblem

fltComputeBoardMotherBoardVoltageUpperThresholdCritical

fltComputeBoardPowerUsageProblem

fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable

fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable

fltComputeBoardMotherBoardVoltageLowerThresholdCritical

fltMemoryUnitECCThresholdNonCritical

fltMemoryUnitECCThresholdCritical

fltMemoryUnitECCThresholdNonRecoverable

Storage-Related Faults

fltStorageLocalDiskInoperable

fltStorageRaidBatteryInoperable

fltStorageLocalDiskCopybackFailed

fltStorageRaidBatteryDegraded

fltStorageRaidBatteryRelearnAborted

fltStorageRaidBatteryRelearnFailed

fltStorageVirtualDriveConsistencyCheckFailed

fltStorageVirtualDriveDegraded

fltStorageVirtualDriveInoperable

fltStorageVirtualDriveReconstructionFailed

fltStorageControllerInoperable

fltStorageControllerPatrolReadFailed

System Event Log-Related Faults

fltSysdebugMEpLogMEpLogVeryLow

fltSysdebugMEpLogMEpLogFull


Faults Generated in CIMC


This chapter provides information about the faults that may be raised in and reported in CIMC Web UI.

This chapter includes the following sections:

Chassis-Related Faults

Fan-Related Faults

I/O Module-Related Faults

Memory-Related Faults

Processor-Related Faults

Power Supply-Related Faults

Server-Related Faults

Storage-Related Faults

System Event Log-Related Faults

Chassis-Related Faults

fltEquipmentChassisThermalThresholdCritical

Fault Code: F0409

Message:

Thermal condition on chassis [id] cause: [thermalStateQualifier]

Explanation;

This fault occurs under the following condition:

If a component within a chassis is operating outside the safe thermal operating range.

Recommended Action;

If you see this fault, take the following actions:


Step 1 Check the temperature readings and IOM and ensure it is within the recommended thermal safe operating range.

Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.

Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.


Fault Details:

Severity: major
Cause: thermal-problem
mibFaultCode: 409
mibFaultName: fltEquipmentChassisThermalThresholdCritical
moClass: equipment:Chassis
Type: environmental

fltEquipmentChassisThermalThresholdNonCritical

Fault Code; F0410

Message:

Thermal condition on chassis [id] cause: [thermalStateQualifier]

Explanation:

This fault occurs under the following condition:

If a component within a chassis is operating outside the safe thermal operating range.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.

Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.

Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.


Fault Details:

Severity: minor
Cause: thermal-problem
mibFaultCode: 410
mibFaultName: fltEquipmentChassisThermalThresholdNonCritical
moClass: equipment:Chassis
Type: environmental

fltEquipmentChassisThermalThresholdNonRecoverable

Fault Code: F0411

Message:

Thermal condition on chassis [id] cause: [thermalStateQualifier]

Explanation:

This fault occurs under the following condition:

If a component within a chassis is operating outside the safe thermal operating range.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.

Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.

Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: thermal-problem
mibFaultCode: 411
mibFaultName: fltEquipmentChassisThermalThresholdNonRecoverable
moClass: equipment:Chassis
Type: environmental

Fan-Related Faults

fltEquipmentFanDegraded

Fault Code: F0371

Message:

Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]

Explanation:

This fault occurs when one or more fans in a fan module are not operational, but at least one fan is operational.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the fan module.

Step 2 Review the Cisco UCS Site Preparation Guide and ensure the fan module has adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace the faulty fan modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: minor
Cause: equipment-degraded
mibFaultCode: 371
mibFaultName: fltEquipmentFanDegraded
moClass: equipment:Fan
Type: equipment

fltEquipmentFanInoperable

Fault Code: F0373

Message:

Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]

Explanation:

This fault occurs if a fan is not operational.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Remove fan module and re-install the fan module again. Remove only one fan module at a time.

Step 2 Replace fan module with a different fan module

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 373
mibFaultName: fltEquipmentFanInoperable
moClass: equipment:Fan
Type: equipment

fltEquipmentFanModuleMissing

Fault Code: F0377

Message:

[presence]Fan module [tray]-[id] in server [id] presence:

Explanation:

This fault occurs if a fan Module slot is not equipped or removed from its slot.

Recommended Action:

If you see this fault, take the following actions:


Step 1 If the reported slot is empty, insert a fan module into the slot.

Step 2 If the reported slot contains a fan module, remove and reinsert the fan module.

Step 3 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: equipment-missing
mibFaultCode: 377
mibFaultName: fltEquipmentFanModuleMissing
moClass: equipment:FanModule
Type: equipment

fltEquipmentFanPerfThresholdNonCritical

Fault Code: F0395

Message:

[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

Explanation:

This fault occurs when the fan speed reading from the fan controller does not match the desired fan speed and is outside of the normal operating range. This can indicate a problem with a fan or with the reading from the fan controller.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Monitor the fan status.

Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.

Step 3 Replace the fan module.

Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details

Severity: minor
Cause: performance-problem
mibFaultCode: 395
mibFaultName: fltEquipmentFanPerfThresholdNonCritical
moClass: equipment

fltEquipmentFanPerfThresholdCritical

Fault Code: F0396

Message:

[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

Explanation:

This fault occurs when the fan speed read from the fan controller does not match the desired fan speed and has exceeded the critical threshold and is in risk of failure. This can indicate a problem with a fan or with the reading from the fan controller.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Monitor the fan status.

Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.

Step 3 If the above actions did not resolve the issue, create a tech-support file for the chassis and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: performance-problem
mibFaultCode: 396
mibFaultName: fltEquipmentFanPerfThresholdCritical
moClass: equipment:

fltEquipmentFanPerfThresholdNonRecoverable

Fault Code: F0397

Message:

[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

Explanation:

This fault occurs when the fan speed read from the fan controller has far exceeded the desired fan speed. It frequently indicates that the fan has failed.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Replace the fan.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: performance-problem
mibFaultCode: 397
mibFaultName: fltEquipmentFanPerfThresholdNonRecoverable
moClass: equipment:Fan
Type: equipment

fltEquipmentFanMissing

Fault Code: F0434

Message:

[presence]Fan [id] in Fan Module [tray]-[id] under server [id] presence: [presence]

Explanation:

This fault occurs in the unlikely event that a fan in a fan module cannot be detected.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Insert/reinsert the fan module in the slot that is reporting the issue.

Step 2 Replace the fan module with a different fan module, if available.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: equipment-missing
mibFaultCode: 434
mibFaultName: fltEquipmentFanMissing
moClass: equipment:Fan
Type: equipment

I/O Module-Related Faults

fltEquipmentIOCardRemoved

Fault Code: F0376

Message:

[side] IOM [chassisId]/[id] is removed.

Explanation:

This fault typically occurs because an I/O module is removed from the chassis. For a standalone configuration, the chassis associated with the I/O module loses network connectivity. This is a critical fault because it can result in the loss of network connectivity and disrupt data traffic through the I/O module.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Re-seat/re-insert the I/O module.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: equipment-removed
mibFaultCode: 376
mibFaultName: fltEquipmentIOCardRemoved
moClass: equipment:IOCard
Type: equipment

fltEquipmentIOCardThermalProblem

Fault Code:F0379

Message:

[side] IOM [chassisId]/[id] operState: [operState]

Explanation:

This fault occurs when there is a thermal problem on an I/O module. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the I/O modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty I/O modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: thermal-problem
mibFaultCode: 379
mibFaultName: fltEquipmentIOCardThermalProblem
moClass: equipment:IOCard
Type: environmental

fltEquipmentIOCardThermalThresholdNonCritical

Fault Code: F0729

Message:

[side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]

Explanation:

This fault occurs when the temperature of an I/O module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.

Step 3 Verify that the site cooling system is operating properly.

Step 4 Power off unused rack servers.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: minor
Cause: thermal-problem
mibFaultCode: 729
mibFaultName: fltEquipmentIOCardThermalThresholdNonCritical
moClass: equipment:IOCard
Type: environmental

fltEquipmentIOCardThermalThresholdCritical

Fault Code: F0730

Message:

[side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]

Explanation:

This fault occurs when the temperature of an I/O module has exceeded a critical threshold value. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Verify that the site cooling system is operating properly.

Step 3 Power off unused rack servers.

Step 4 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: thermal-problem
mibFaultCode: 730
mibFaultName: fltEquipmentIOCardThermalThresholdCritical
moClass: equipment:IOCard
Type: environmental

fltEquipmentIOCardThermalThresholdNonRecoverable

Fault Code: F0731

Message:

[side] IOM [chassisId]/[id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of an I/O module has been out of the operating range, and the

issue is not recoverable. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.

Step 3 Verify that the site cooling system is operating properly.

Step 4 Power off unused rack servers.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: thermal-problem
mibFaultCode: 731
mibFaultName: fltEquipmentIOCardThermalThresholdNonRecoverable
moClass: equipment:IOCard
Type: environmental

Memory-Related Faults

fltMemoryUnitDegraded

Fault Code: F0184

Message:

DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]

operability: [operability]

Explanation:

This fault occurs when a DIMM is in a degraded operability state. This state typically occurs when an excessive number of correctable ECC errors are reported on the DIMM by the server BIOS.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Monitor the DIMM for further ECC errors. If the high number of errors persists, there is a high possibility of the DIMM becoming inoperable.

Step 2 If the DIMM becomes inoperable, replace the DIMM.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: equipment-degraded
mibFaultCode: 184
mibFaultName: fltMemoryUnitDegraded
moClass: memory:Unit
Type: equipment

fltMemoryUnitInoperable

Fault Code:F0185

Message:

DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]

operability: [operability]

Explanation:

This fault typically occurs because an above threshold number of correctable or uncorrectable errors has occurred on a DIMM. The DIMM may be inoperable.

Recommended Action:

If you see this fault, take the following actions:


Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.

Step 2 If necessary, replace the DIMM.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 185
mibFaultName: fltMemoryUnitInoperable
moClass: memory:Unit

fltMemoryUnitThermalThresholdNonCritical

Fault Code:F0186

Message:

DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]

temperature: [thermal]

Explanation:

This fault occurs when the temperature of a memory unit on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. Inaddition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: minor
Cause: thermal-problem
mibFaultCode: 186
mibFaultName: fltMemoryUnitThermalThresholdNonCritical
moClass: memory:Unit
Type: environmental

fltMemoryUnitThermalThresholdCritical

Fault Code:F0187

Message:

DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]

temperature: [thermal]

Explanation:

This fault occurs when the temperature of a memory unit on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning
Cause: thermal-problem
mibFaultCode: 187
mibFaultName: fltMemoryUnitThermalThresholdCritical
moClass: memory:Unit
Type: environmental

fltMemoryUnitThermalThresholdNonRecoverable

Fault Code:F0188

Message:

DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a memory unit on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: thermal-problem
mibFaultCode: 188
mibFaultName: fltMemoryUnitThermalThresholdNonRecoverable
moClass: memory:Unit
Type: environmental

fltMemoryArrayVoltageThresholdCritical

Fault Code:F0190

Message:

Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]

Explanation:

This fault occurs when the memory array voltage exceeds the specified hardware voltage rating.

Recommended Action:

If you see this fault, take the following actions:


Step 1 If the SEL is enabled, look at the SEL statistics on the DIMM to determine which threshold was crossed.

Step 2 Monitor the memory array for further degradation.

Step 3 Replace the power supply.

Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: voltage-problem
mibFaultCode: 190
mibFaultName: fltMemoryArrayVoltageThresholdCritical
moClass: memory:Array

fltMemoryArrayVoltageThresholdNonRecoverable

Fault Code: F0191

Message:

Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]

Explanation:

This fault occurs when the memory array voltage exceeded the specified hardware voltage rating and potentially memory hardware may be in damage or jeopardy.

Recommended Action:

If you see this fault, take the following actions:


Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.

Step 2 Monitor the memory array for further degradation.

Step 3 Replace the power supply.

Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: voltage-problem
mibFaultCode: 191
mibFaultName: fltMemoryArrayVoltageThresholdNonRecoverable
moClass: memory:Array
Type: environmental

fltMemoryUnitIdentityUnestablishable

Fault Code: F0502

Message:

DIMM [location] on server [chassisId]/[slotId] has an invalid FRUDIMM [location] on server [id] has an invalid FRU

Explanation:

This fault typically occurs when a sensor has detected an unsupported DIMM in the server. For example, the model, vendor, or revision is not recognized

Recommended Action:

If you see this fault, take the following action:


Step 1 Verify if the DIMM is supported on the server configuration.

Step 2 If the above action did not resolve the issue, you may have unsupported DIMMs or DIMM configuration in the server. Contact Cisco TAC.


Fault Details:

Severity: warning
Cause: identity-unestablishable
mibFaultCode: 502
mibFaultName: fltMemoryUnitIdentityUnestablishable
moClass: memory:Unit
Type: equipment

Processor-Related Faults

fltProcessorUnitInoperable

Fault Code: F0174

Message

Processor [id] on server [chassisId]/[slotId] operability: [operability]

Explanation

This fault occurs in the event the processor encounters a catastrophic error or has exceeded pre-set thermal/power thresholds.

Recommended Action

If you see this fault, take the following action:


Step 1 In the event that the probable cause being indicated is a thermal problem, check to see if the air flow to the server is not obstructed, and it is adequately ventilated. If possible, check if the heat sink is properly seated on the processor.

Step 2 In the event that the probable cause being indicated is equipment inoperable, please contact Cisco TAC for further instructions.

Step 3 In the event that the probable cause being indicated is a power or voltage problem, it is recommended to see if the issue is resolved with an alternate power supply. If this fails to resolve the issue, please contact Cisco TAC.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 174
mibFaultName: fltProcessorUnitInoperable
moClass: processor:Unit
Type: equipment

fltProcessorUnitThermalNonCritical

Fault Code: F0175

Message:

Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the processor temperature on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following action:


Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: minor
Cause: thermal-problem
mibFaultCode: 175
mibFaultName: fltProcessorUnitThermalNonCritical
moClass: processor:Unit
Type: environmental

fltProcessorUnitThermalThresholdCritical

Fault Code: F0176

Message:

Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the processor temperature on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: thermal-problem
mibFaultCode: 176
mibFaultName: fltProcessorUnitThermalThresholdCritical
moClass: processor:Unit
Type: environmental

fltProcessorUnitThermalThresholdNonRecoverable

Fault Code: F0177

Message:

Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the processor temperature on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: non-recoverable
Cause: thermal-problem
mibFaultCode: 177
mibFaultName: fltProcessorUnitThermalThresholdNonRecoverable
moClass: processor:Unit
Type: environmental

fltProcessorUnitDisabled

Fault Code: F0842

Message:

Processor [id] on server [chassisId]/[slotId] operState: [operState]Processor [id] on server [id] operState: [operState]

Explanation:

This fault occurs in the unlikely event that a processor is disabled.

Recommended Action:

If you see this fault, take the following actions:


Step 1 If this fault occurs , remove and reinsert the server into the chassis.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: info
Cause: equipment-disabled
mibFaultCode: 842
mibFaultName: fltProcessorUnitDisabled
moClass: processor:Unit
Type: environmental

Power Supply-Related Faults

fltEquipmentPsuInoperable

Fault Code: F0374

Message:

[operability]Power supply [id] in server [id] operability: [operability]

Explanation:

This fault typically occurs when the power supply unit is either offline or the input/output voltage is out of range.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Verify that the power cord is properly connected to the PSU and the power source.

Step 2 Verify that the power source is 220 volts.

Step 3 Remove the PSU and reinstall it.

Step 4 Replace the PSU.

Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 374
mibFaultName: fltEquipmentPsuInoperable
moClass: equipment:Psu
Type: equipment

fltEquipmentPsuThermalThresholdNonCritical

Fault Code: F0381

Message:

[thermal]Power supply [id] in server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a PSU module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty PSU modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: minor
Cause: thermal-problem
mibFaultCode: 381
mibFaultName: fltEquipmentPsuThermalThresholdNonCritical
moClass: equipment:Psu
Type: environmental

fltEquipmentPsuThermalThresholdCritical

Fault Code: F0383

Message:

[thermal]Power supply [id] in server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a PSU module has exceeded a critical threshold value. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following action:


Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty PSU modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: thermal-problem
mibFaultCode: 383
mibFaultName: fltEquipmentPsuThermalThresholdCritical
moClass: equipment:Psu
Type: environmental

fltEquipmentPsuMissing

Fault Code: F0378

Message:

[presence]Power supply [id] in server [id] presence: [presence]

Explanation:

This fault typically occurs when the power supply module is either missing or the input power to the server is absent.

Recommended Action:

If you see this fault, take the following action:


Step 1 Check to see if the power supply is connected to a power source.

Step 2 If the PSU is physically present in the slot, remove and then reinsert it.

Step 3 If the PSU is not physically present in the slot, insert a new PSU.

Step 4 If you see this fault, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: equipment-missing
mibFaultCode: 378
mibFaultName: fltEquipmentPsuMissing
moClass: equipment:Psu
Type: equipment

fltEquipmentPsuThermalThresholdNonRecoverable

Fault Code: F0385

Message:

[thermal]Power supply [id] in server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a PSU module has been out of operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following actions:


Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty PSU modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: thermal-problem
mibFaultCode: 385
mibFaultName: fltEquipmentPsuThermalThresholdNonRecoverable
moClass: equipment:Psu
Type: environmental

fltEquipmentPsuVoltageThresholdCritical

Fault Code: F0389

Message:

[voltage]Power supply [id] in server [id] voltage: [voltage]

Explanation:

This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Remove and reseat the PSU.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: voltage-problem
mibFaultCode: 389
mibFaultName: fltEquipmentPsuVoltageThresholdCritical
moClass: equipment:Psu
Type: environmental

fltEquipmentPsuVoltageThresholdNonRecoverable

Fault Code:F0391

Message:

[voltage]Power supply [id] in server [id] voltage: [voltage]

Explanation:

This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating and PSU hardware may have been damaged as a result or may be at risk of being damaged.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Remove and reseat the PSU.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: voltage-problem
mibFaultCode: 391
mibFaultName: fltEquipmentPsuVoltageThresholdNonRecoverable
moClass: equipment:Psu
Type: environmental

fltEquipmentPsuPerfThresholdNonCritical

Fault Code: F0392

Message:

[perf]Power supply [id] in server [id] output power: [perf]

Explanation:

This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Monitor the PSU status.

Step 2 If possible, remove and reseat the PSU.

Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.


Fault Details:

Severity: minor
Cause: power-problem
mibFaultCode: 392
mibFaultName: fltEquipmentPsuPerfThresholdNonCritical
moClass: equipment:Psu
Type: equipment

fltEquipmentPsuPerfThresholdCritical

Fault Code: F0393

Message

[perf]Power supply [id] in server [id] output power: [perf]

Explanation:

This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Monitor the PSU status.

Step 2 If possible, remove and reseat the PSU.

Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.


Fault Details:

Severity: warning
Cause: power-problem
mibFaultCode: 393
mibFaultName: fltEquipmentPsuPerfThresholdCritical
moClass: equipment:Psu
Type: equipment

fltEquipmentPsuPerfThresholdNonRecoverable

Fault Code:F0394

Message:

[perf] Power supply [id] in server [id] output power: [perf]

Explanation:

This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Monitor the PSU status.

Step 2 If possible, remove and reseat the PSU.

Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.


Fault Details:

Severity: major
Cause: power-problem
mibFaultCode: 394
mibFaultName: fltEquipmentPsuPerfThresholdNonRecoverable
moClass: equipment:Psu
Type: equipment

fltEquipmentPsuIdentity

Fault Code: F0407

Message:

Power supply [id] on chassis [id] has a malformed FRUPower supply [id] on server [id] has a malformed FRU

Explanation:

This fault typically occurs when the FRU information for a power supply unit is corrupted or malformed.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Verify that the vendor specification for the power supply.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: fru-problem
mibFaultCode: 407
mibFaultName: fltEquipmentPsuIdentity
moClass: equipment:Psu
Type: equipment

fltPowerChassisMemberChassisPsuRedundanceFailure

Fault Code: F0743

Message

Chassis [id] was configured for redundancy, but running in a non-redundant configuration.

Explanation

This fault typically occurs when chassis power redundancy has failed.

Recommended Action

If you see this fault, take the following actions:


Step 1 Consider adding more PSUs to the chassis.

Step 2 Replace any non-functional PSUs.

Step 3 If the above actions did not resolve the issue, create a show tech-support file and contact Cisco TAC.


Fault Details

Severity: major
Cause: psu-redundancy-fail
mibFaultCode: 743
mibFaultName: fltPowerChassisMemberChassisPsuRedundanceFailure
moClass: power:ChassisMember
Type: environmental

fltEquipmentPsuPowerThreshold

Fault Code: F0882

Message:

Power supply [id] on chassis [id] has exceeded its power thresholdPower supply [id] on server [id] has exceeded its power threshold.

Explanation:

This fault occurs when a power supply unit is drawing too much current.

Recommended Action:

If you see this fault, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: power-problem
mibFaultCode: 882
mibFaultName: fltEquipmentPsuPowerThreshold
moClass: equipment:Psu
Type: equipment

fltEquipmentPsuInputError

Fault Code: F0883

Message:

Power supply [id] on chassis [id] has disconnected cable or bad input voltagePower supply [id] on server [id] has disconnected cable or bad input voltage.

Explanation:

This fault occurs when a power cable is disconnected or input voltage is incorrect.

Recommended Action:

If you see this fault, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: power-problem
mibFaultCode: 883
mibFaultName: fltEquipmentPsuInputError
moClass: equipment:Psu
Type: equipment

Server-Related Faults

fltComputeBoardPowerError

Fault Code: F0310

Message:

Motherboard of server [chassisId]/[slotId] (service profile: [assignedToDn]) power: [operPower]Motherboard of server [id] (service profile: [assignedToDn]) power: [operPower]

Explanation:

This fault typically occurs when the server power sensors have detected a problem.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Reseat/replace the power supply.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: power-problem
mibFaultCode: 310
mibFaultName: fltComputeBoardPowerError
moClass: compute:Board
Type: environmental

fltComputePhysicalBiosPostTimeout

Fault Code: F0313

Message:

Server [id] (service profile: [assignedToDn]) BIOS failed power-on self testServer [chassisId]/[slotId] (service profile: [assignedToDn]) BIOS failed power-on self test.

Explanation:

This fault typically occurs when the server has encountered a diagnostic failure.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Connect to the CIMC WebUI and record from the KVM where the POST failure has occured.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: equipment-inoperable
mibFaultCode: 313
mibFaultName: fltComputePhysicalBiosPostTimeout
moClass: compute:Physical
Type: equipment

fltComputeBoardCmosVoltageThresholdCritical

Fault Code: F0424

Message:

Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]

Explanation:

This fault is raised when the CMOS battery voltage has dropped to lower than the normal operating range. This could impact the clock and other CMOS settings.

Recommended Action:

If you see this fault, replace the battery.


Fault Details:

Severity: critical
Cause: voltage-problem
mibFaultCode: 424
mibFaultName: fltComputeBoardCmosVoltageThresholdCritical
moClass: compute:Board
Type: environmental

fltComputeBoardCmosVoltageThresholdNonRecoverable

Fault Code: F0425

Message:

Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]

Explanation:

This fault is raised when the CMOS battery voltage has dropped quite low and is unlikely to recover. This impacts the clock and other CMOS settings.

Recommended Action:

If you see this fault, replace the battery.


Fault Details:

Severity: major
Cause: voltage-problem
mibFaultCode: 425
mibFaultName: fltComputeBoardCmosVoltageThresholdNonRecoverable
moClass: compute:Board
Type: environmental

fltComputeIOHubThermalNonCritical

Fault Code: F0538
Message:

IO Hub on server [chassisId]/[slotId] temperature: [thermal]

Explanation:

This fault is raised when the IO controller temperature is outside the upper or lower non-critical threshold.

Recommended Action:

If you see this fault, monitor other environmental events related to this server and ensure the temperature ranges are within recommended ranges.


Fault Details:
Severity: minor
Cause: thermal-problem
mibFaultCode: 538
mibFaultName: fltComputeIOHubThermalNonCritical
moClass: compute:IOHub
Type: environmental

fltComputeIOHubThermalThresholdCritical

Fault Code: F0539

Message:

IO Hub on server [chassisId]/[slotId] temperature: [thermal]

Explanation:

This fault is raised when the IO controller temperature is outside the upper or lower critical threshold.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Monitor other environmental events related to the server and ensure the temperature ranges are within recommended ranges.

Step 2 Consider turning off the server for a while if possible.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: thermal-problem
mibFaultCode: 539
mibFaultName: fltComputeIOHubThermalThresholdCritical
moClass: compute:IOHub
Type: environmental

fltComputeIOHubThermalThresholdNonRecoverable

Fault Code: F0540

Message:

IO Hub on server [chassisId]/[slotId] temperature: [thermal]

Explanation:

This fault is raised when the IO controller temperature is outside the recoverable range of operation.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Shut down the server immediately.

Step 2 Create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: thermal-problem
mibFaultCode: 540
mibFaultName: fltComputeIOHubThermalThresholdNonRecoverable
moClass: compute:IOHub
Type: environmental

fltComputePhysicalPostFailure

Fault Code: F0517

Message:

Server [id] POST or diagnostic failureServer [chassisId]/[slotId] POST or diagnostic failure.

Explanation:

This fault typically occurs when the server has encountered a diagnostic failure or an error during POST.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Check the POST result for the server.

Step 2 Reboot the server.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco Technical Support.


Fault Details:

Severity: major
Cause: equipment-problem
mibFaultCode: 517
mibFaultName: fltComputePhysicalPostFailure
moClass: compute:Physical
Type: server

fltComputeBoardPowerFail

Fault Code: F0868

Message:

[power]Motherboard of server [id] power: [power]

Explanation:

This fault typically occurs when the power sensors on a server detect a problem.

Recommended Action:

If you see this fault, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: critical
Cause: power-problem
mibFaultCode: 868
mibFaultName: fltComputeBoardPowerFail
moClass: compute:Board
Type: environmental

fltComputeBoardThermalProblem

Fault Code: F0869

Message:

Motherboard of server [chassisId]/[slotId] : [assignedToDn]) thermal: [thermal]Motherboard of server [id] : [assignedToDn]) thermal: [thermal]

Explanation:

This fault typically occurs when the motherboard thermal sensors on a server detect a problem.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Verify that the server fans are working properly.

Step 2 Wait for 24 hours to see if the problem resolves itself.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: thermal-problem
mibFaultCode: 869
mibFaultName: fltComputeBoardThermalProblem
moClass: compute:Board
Type: environmental

fltComputeBoardMotherBoardVoltageUpperThresholdCritical

Fault Code: F0920

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has exceeded upper critical thresholds.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Reseat or replace the power supply.

Step 2 If the issue persists, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major 
Cause: voltage-problem
mibFaultCode: 920
mibFaultName: fltComputeBoardMotherBoardVoltageUpperThresholdCritical
moClass: compute:Board
Type: environmental

fltComputeBoardPowerUsageProblem

Fault Code: F1040

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when the motherboard power consumption exceeds certain threshold limits. When this happens, the power usage sensors on a server detect a problem.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Contact Cisco TAC.


Fault Details:

Severity: warning 
Cause: power-problem
mibFaultCode: 1040
mibFaultName: fltComputeBoardPowerUsageProblem
moClass: compute:Board
Type: environmental

fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable

Fault Code: F0918

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has become too high and is unlikely to recover.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Contact Cisco TAC.


Fault Details:

Severity: critical
Cause: voltage-problem
mibFaultCode: 918
mibFaultName: fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable
moClass: compute:Board
Type: environmental

fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable

Fault Code: F0919

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has dropped too low and is unlikely to recover.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Contact Cisco TAC.


Fault Details:

Severity: critical
Cause: voltage-problem
mibFaultCode: 919
mibFaultName: fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable
moClass: compute: Board
Type: environmental

fltComputeBoardMotherBoardVoltageLowerThresholdCritical

Fault Code: F0921

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has crossed lower critical thresholds.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Reseat or replace the power supply.

Step 2 If the issue persists, create a tech-support file and contact TAC.


Fault Details:

Severity: major
Cause: voltage-problem
mibFaultCode: 921
mibFaultName: fltComputeBoardMotherBoardVoltageLowerThresholdCritical
moClass: compute: Board
Type: environmental

fltMemoryUnitECCThresholdNonCritical

Fault Code: F2500

Message:

"sys/rack-unit-1/board/memarray-%d/mem-%d"

Explanation:

This fault indicates that the memory DIMM has crossed a non critical threshold of reported ECC errors.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.

Step 2 Monitor the server for temperature/voltage thresholds.


Fault Details:

Severity: minor
Cause: equipment-degraded
mibFaultCode: 2500
mibFaultName: fltMemoryUnitECCThresholdNonCritical
moClass: memory: Unit
Type: equipment

fltMemoryUnitECCThresholdCritical

Fault Code: F2501

Message:

"sys/rack-unit-1/board/memarray-%d/mem-%d"

Explanation:

This fault indicates that the memory DIMM has crossed a critical threshold of reported ECC errors.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.

Step 2 Monitor the server for temperature/voltage thresholds.


Fault Details:

Severity: warning
Cause: equipment-degraded
mibFaultCode: 2501
mibFaultName: fltMemoryUnitECCThresholdCritical
moClass: memory: Unit
Type: equipment

fltMemoryUnitECCThresholdNonRecoverable

Fault Code: F2502

Message:

"sys/rack-unit-1/board/memarray-%d/mem-%d"

Explanation:

This fault indicates that the memory DIMM has crossed a non recoverable threshold of reported ECC errors.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Replace the memory DIMM.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 2502
mibFaultName: fltMemoryUnitECCThresholdNonRecoverable
moClass: memory: Unit
Type: equipment

Storage-Related Faults

fltStorageLocalDiskInoperable

Fault Code: F0181

Message:

Local disk [id] on server [chassisId]/[slotId] operability: [operability]Local disk [id] on server [id] operability: [operability]

Explanation:

This fault occurs when the local disk has become inoperable or has been removed while the server was in use.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Insert the disk in a supported slot.

Step 2 Remove and reinsert the local disk.

Step 3 Replace the disk, if an additional disk is available.

If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 181
mibFaultName: fltStorageLocalDiskInoperable
moClass: storage:LocalDisk

fltStorageRaidBatteryInoperable

Fault Code: F0531

Message:

RAID Battery on server [chassisId]/[slotId] operability: [operability]RAID Battery on server [id] operability: [operability]

Explanation:

This fault occurs when the RAID battery voltage is below the normal operating range.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Replace the RAID battery.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 531
mibFaultName: fltStorageRaidBatteryInoperable
moClass: storage:RaidBattery
Type: equipment

fltStorageLocalDiskCopybackFailed

Fault Code: F0978

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/pd-%d"

Explanation:

This fault indicates a physical disk copyback failure. This fault could indicate a physical drive problem or an issue with the RAID configuration.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Replace the physical drive and check to see if the issue is resolved after a rebuild.

Step 2 Reseat or replace the storage controller.

Step 3 Check configuration options for the storage controller in the MegaRAID ROM configuration page.


Fault Details:

Severity: warning
Cause: equipment-offline
mibFaultCode: 978
mibFaultName: fltStorageLocalDiskCopybackFailed
moClass: storage:LocalDisk
Type: equipment 

fltStorageRaidBatteryDegraded

Fault Code: F0969

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

Explanation:

This fault indicates a controller battery backup unit failure.

Recommended Action:

If you see this fault, take the following action:


Step 1 Reseat or replace the battery backup unit on the storage controller.


Fault Details:

Severity: warning
Cause: equipment-degraded
mibFaultCode: 969
mibFaultName: fltStorageRaidBatteryDegraded
moClass: storage:RaidBattery
Type: equipment

fltStorageRaidBatteryRelearnAborted

Fault Code: F0970

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

Explanation:

This fault indicates that a controller battery relearn was aborted.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Restart the relearn process for the battery backup unit.

Step 2 Reseat or replace the battery backup unit.

Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.


Fault Details:

Severity: info
Cause: equipment-degraded
mibFaultCode: 970
mibFaultName: fltStorageRaidBatteryRelearnAborted
moClass: storage:RaidBattery
Type: equipment 

fltStorageRaidBatteryRelearnFailed

Fault Code: F0971

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

Explanation:

This fault indicates a controller battery relearn failure.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Restart the relearn process for the battery backup unit.

Step 2 Reseat or replace the battery backup unit.

Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.


Fault Details:

Severity: warning
Cause: equipment-degraded
mibFaultCode: 971
mibFaultName: fltStorageRaidBatteryRelearnFailed
moClass: storage:RaidBattery
Type: equipment

fltStorageVirtualDriveConsistencyCheckFailed

Fault Code: F0982

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a consistency check failure with the virtual drive.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Initiate a consistency check on the virtual drive.

Step 2 Replace any faulty physical drives.


Fault Details:

Severity: warning
Cause: equipment-degraded
mibFaultCode: 982
mibFaultName: fltStorageVirtualDriveConsistencyCheckFailed
moClass: storage:VirtualDrive
Type: equipment

fltStorageVirtualDriveDegraded

Fault Code: F1008

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a recoverable error with the virtual drive.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Initiate a consistency check on the virtual drive.

Step 2 Replace any faulty physical drives.


Fault Details:

Severity: warning
Cause: equipment-degraded
mibFaultCode: 1008
mibFaultName: fltStorageVirtualDriveDegraded
moClass: storage:VirtualDrive
Type: equipment

fltStorageVirtualDriveInoperable

Fault Code: F1007

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a non-recoverable error with the virtual drive.

Recommended Action:

If you see this fault, take the following actions:


Step 1 If the data on the drive is accessible, back up and recreate the virtual drive.

Step 2 Replace any faulty physical drives.

Step 3 Check for controller errors in the MegaRAID ROM page logs.


Fault Details:

Severity: major
Cause: equipment-inoperable
mibFaultCode: 1007
mibFaultName: fltStorageVirtualDriveInoperable
moClass: storage:storage:VirtualDrive
Type: equipment

fltStorageVirtualDriveReconstructionFailed

Fault Code: F0981

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a failure in the reconstruction process of the virtual drive.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Restart the reconstruction process.


Fault Details:

Severity: warning
Cause: equipment-degraded
mibFaultCode: 981
mibFaultName: fltStorageVirtualDriveReconstructionFailed
moClass: storage:VirtualDrive
Type: equipment

fltStorageControllerInoperable

Fault Code: F0976

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d"

Explanation:

This fault indicates a non-recoverable storage controller failure. This happens when the storage system cannot contact the controller for a period of time, after which it gives up, and raises this fault.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Reseat or replace the storage controller.


Fault Details:

Severity: warning
Cause: equipment-inoperable
mibFaultCode: 976
mibFaultName: fltStorageControllerInoperable
moClass: storage:Controller
Type: equipment

fltStorageControllerPatrolReadFailed

Fault Code: F1003

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d"

Explanation:

This fault indicates that the review of the storage system for potential physical disk errors has failed.

Recommended Action:

If you see this fault, take the following actions:


Step 1 Initiate a consistency check on the virtual drive.

Step 2 Replace any faulty physical drives.


Fault Details:

Severity: warning
Cause: equipment-inoperable
mibFaultCode: 1003
mibFaultName: fltStorageControllerPatrolReadFailed
moClass: storage:Controller
Type: equipment

System Event Log-Related Faults

fltSysdebugMEpLogMEpLogVeryLow

Fault Code: F0461

Message:

Log capacity on Management Controller on server [id] is [capacity]

Explanation

This fault typically occurs because Cisco Integrated Management Controller (CIMC) has detected that the system event log (SEL) on the server is almost full. The available capacity in the log is very low. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.

Recommended Action

If you see this fault, you can clear the SEL, if desired.

Fault Details:

Severity: info
Cause: log-capacity
mibFaultCode: 461
mibFaultName: fltSysdebugMEpLogMEpLogVeryLow
moClass: sysdebug:MEpLog
Type: operational

fltSysdebugMEpLogMEpLogFull

Fault Code: F0462

Message:

Log capacity on Management Controller on server [id] is [capacity]

Explanation

This fault typically occurs because Cisco CIMC could not transfer the SEL file to the location specified in the SEL policy. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.

Recommended Action

If you see this fault, take the following actions:


Step 1 Verify the configuration of the SEL policy to ensure that the location, user, and password provided are

correct.

Step 2 If you do want to transfer and clear the SEL and the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


Fault Details

Severity: info
Cause: log-capacity
mibFaultCode: 462
mibFaultName: fltSysdebugMEpLogMEpLogFull
moClass: sysdebug:MEpLog
Type: operational