Cisco UCS C-Series Servers Integrated Management Controller Faults Reference Guide

Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.

Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.

Fault Details:

Severity: major

Cause: thermal-problem

mibFaultCode: 409

mibFaultName: fltEquipmentChassisThermalThresholdCritical

moClass: equipment:Chassis

Type: environmental

fltEquipmentChassisThermalThresholdNonCritical

Fault Code; F0410

Message:

Thermal condition on chassis [id] cause: [thermalStateQualifier]

Explanation:

This fault occurs under the following condition:

•If a component within a chassis is operating outside the safe thermal operating range.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.

Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.

Fault Details:

Severity: minor

Cause: thermal-problem

mibFaultCode: 410

mibFaultName: fltEquipmentChassisThermalThresholdNonCritical

moClass: equipment:Chassis

Type: environmental

fltEquipmentChassisThermalThresholdNonRecoverable

Fault Code: F0411

Message:

Thermal condition on chassis [id] cause: [thermalStateQualifier]

Explanation:

This fault occurs under the following condition:

•If a component within a chassis is operating outside the safe thermal operating range.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.

Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: thermal-problem

mibFaultCode: 411

mibFaultName: fltEquipmentChassisThermalThresholdNonRecoverable

moClass: equipment:Chassis

Type: environmental

Fan-Related Faults

fltEquipmentFanDegraded

Fault Code: F0371

Message:

Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]

Explanation:

This fault occurs when one or more fans in a fan module are not operational, but at least one fan is operational.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the fan module.

Step 2 Review the Cisco UCS Site Preparation Guide and ensure the fan module has adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace the faulty fan modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: minor

Cause: equipment-degraded

mibFaultCode: 371

mibFaultName: fltEquipmentFanDegraded

moClass: equipment:Fan

Type: equipment

fltEquipmentFanInoperable

Fault Code: F0373

Message:

Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]

Explanation:

This fault occurs if a fan is not operational.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Remove fan module and re-install the fan module again. Remove only one fan module at a time.

Step 2 Replace fan module with a different fan module

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 373

mibFaultName: fltEquipmentFanInoperable

moClass: equipment:Fan

Type: equipment

fltEquipmentFanModuleMissing

Fault Code: F0377

Message:

[presence]Fan module [tray]-[id] in server [id] presence:

Explanation:

This fault occurs if a fan Module slot is not equipped or removed from its slot.

Recommended Action:

If you see this fault, take the following actions:

Step 1 If the reported slot is empty, insert a fan module into the slot.

Step 2 If the reported slot contains a fan module, remove and reinsert the fan module.

Step 3 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: equipment-missing

mibFaultCode: 377

mibFaultName: fltEquipmentFanModuleMissing

moClass: equipment:FanModule

Type: equipment

fltEquipmentFanPerfThresholdNonCritical

Fault Code: F0395

Message:

[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

Explanation:

This fault occurs when the fan speed reading from the fan controller does not match the desired fan speed and is outside of the normal operating range. This can indicate a problem with a fan or with the reading from the fan controller.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Monitor the fan status.

Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.

Step 3 Replace the fan module.

Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: minor

Cause: performance-problem

mibFaultCode: 395

mibFaultName: fltEquipmentFanPerfThresholdNonCritical

moClass: equipment

fltEquipmentFanPerfThresholdCritical

Fault Code: F0396

Message:

[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

Explanation:

This fault occurs when the fan speed read from the fan controller does not match the desired fan speed and has exceeded the critical threshold and is in risk of failure. This can indicate a problem with a fan or with the reading from the fan controller.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Monitor the fan status.

Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.

Step 3 If the above actions did not resolve the issue, create a tech-support file for the chassis and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: performance-problem

mibFaultCode: 396

mibFaultName: fltEquipmentFanPerfThresholdCritical

moClass: equipment:

fltEquipmentFanPerfThresholdNonRecoverable

Fault Code: F0397

Message:

[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

Explanation:

This fault occurs when the fan speed read from the fan controller has far exceeded the desired fan speed. It frequently indicates that the fan has failed.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Replace the fan.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: performance-problem

mibFaultCode: 397

mibFaultName: fltEquipmentFanPerfThresholdNonRecoverable

moClass: equipment:Fan

Type: equipment

fltEquipmentFanMissing

Fault Code: F0434

Message:

[presence]Fan [id] in Fan Module [tray]-[id] under server [id] presence: [presence]

Explanation:

This fault occurs in the unlikely event that a fan in a fan module cannot be detected.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Insert/reinsert the fan module in the slot that is reporting the issue.

Step 2 Replace the fan module with a different fan module, if available.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: equipment-missing

mibFaultCode: 434

mibFaultName: fltEquipmentFanMissing

moClass: equipment:Fan

Type: equipment

I/O Module-Related Faults

fltEquipmentIOCardRemoved

Fault Code: F0376

Message:

[side] IOM [chassisId]/[id] is removed.

Explanation:

This fault typically occurs because an I/O module is removed from the chassis. For a standalone configuration, the chassis associated with the I/O module loses network connectivity. This is a critical fault because it can result in the loss of network connectivity and disrupt data traffic through the I/O module.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Re-seat/re-insert the I/O module.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: equipment-removed

mibFaultCode: 376

mibFaultName: fltEquipmentIOCardRemoved

moClass: equipment:IOCard

Type: equipment

fltEquipmentIOCardThermalProblem

Fault Code:F0379

Message:

[side] IOM [chassisId]/[id] operState: [operState]

Explanation:

This fault occurs when there is a thermal problem on an I/O module. Be aware of the following possible contributing factors:

•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the I/O modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty I/O modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: thermal-problem

mibFaultCode: 379

mibFaultName: fltEquipmentIOCardThermalProblem

moClass: equipment:IOCard

Type: environmental

fltEquipmentIOCardThermalThresholdNonCritical

Fault Code: F0729

Message:

[side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]

Explanation:

This fault occurs when the temperature of an I/O module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.

Step 3 Verify that the site cooling system is operating properly.

Step 4 Power off unused rack servers.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: minor

Cause: thermal-problem

mibFaultCode: 729

mibFaultName: fltEquipmentIOCardThermalThresholdNonCritical

moClass: equipment:IOCard

Type: environmental

fltEquipmentIOCardThermalThresholdCritical

Fault Code: F0730

Message:

[side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]

Explanation:

This fault occurs when the temperature of an I/O module has exceeded a critical threshold value. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Verify that the site cooling system is operating properly.

Step 3 Power off unused rack servers.

Step 4 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: thermal-problem

mibFaultCode: 730

mibFaultName: fltEquipmentIOCardThermalThresholdCritical

moClass: equipment:IOCard

Type: environmental

fltEquipmentIOCardThermalThresholdNonRecoverable

Fault Code: F0731

Message:

[side] IOM [chassisId]/[id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of an I/O module has been out of the operating range, and the

issue is not recoverable. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.

Step 3 Verify that the site cooling system is operating properly.

Step 4 Power off unused rack servers.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: thermal-problem

mibFaultCode: 731

mibFaultName: fltEquipmentIOCardThermalThresholdNonRecoverable

moClass: equipment:IOCard

Type: environmental

Memory-Related Faults

fltMemoryUnitDegraded

Fault Code: F0184

Message:

DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]

operability: [operability]

Explanation:

This fault occurs when a DIMM is in a degraded operability state. This state typically occurs when an excessive number of correctable ECC errors are reported on the DIMM by the server BIOS.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Monitor the DIMM for further ECC errors. If the high number of errors persists, there is a high possibility of the DIMM becoming inoperable.

Step 2 If the DIMM becomes inoperable, replace the DIMM.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: equipment-degraded

mibFaultCode: 184

mibFaultName: fltMemoryUnitDegraded

moClass: memory:Unit

Type: equipment

fltMemoryUnitInoperable

Fault Code:F0185

Message:

DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]

operability: [operability]

Explanation:

This fault typically occurs because an above threshold number of correctable or uncorrectable errors has occurred on a DIMM. The DIMM may be inoperable.

Recommended Action:

If you see this fault, take the following actions:

Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.

Step 2 If necessary, replace the DIMM.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 185

mibFaultName: fltMemoryUnitInoperable

moClass: memory:Unit

fltMemoryUnitThermalThresholdNonCritical

Fault Code:F0186

Message:

DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]

temperature: [thermal]

Explanation:

This fault occurs when the temperature of a memory unit on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. Inaddition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: minor

Cause: thermal-problem

mibFaultCode: 186

mibFaultName: fltMemoryUnitThermalThresholdNonCritical

moClass: memory:Unit

Type: environmental

fltMemoryUnitThermalThresholdCritical

Fault Code:F0187

Message:

DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]

temperature: [thermal]

Explanation:

This fault occurs when the temperature of a memory unit on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: thermal-problem

mibFaultCode: 187

mibFaultName: fltMemoryUnitThermalThresholdCritical

moClass: memory:Unit

Type: environmental

fltMemoryUnitThermalThresholdNonRecoverable

Fault Code:F0188

Message:

DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a memory unit on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: thermal-problem

mibFaultCode: 188

mibFaultName: fltMemoryUnitThermalThresholdNonRecoverable

moClass: memory:Unit

Type: environmental

fltMemoryArrayVoltageThresholdCritical

Fault Code:F0190

Message:

Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]

Explanation:

This fault occurs when the memory array voltage exceeds the specified hardware voltage rating.

Recommended Action:

If you see this fault, take the following actions:

Step 1 If the SEL is enabled, look at the SEL statistics on the DIMM to determine which threshold was crossed.

Step 2 Monitor the memory array for further degradation.

Step 3 Replace the power supply.

Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: voltage-problem

mibFaultCode: 190

mibFaultName: fltMemoryArrayVoltageThresholdCritical

moClass: memory:Array

fltMemoryArrayVoltageThresholdNonRecoverable

Fault Code: F0191

Message:

Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]

Explanation:

This fault occurs when the memory array voltage exceeded the specified hardware voltage rating and potentially memory hardware may be in damage or jeopardy.

Recommended Action:

If you see this fault, take the following actions:

Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.

Step 2 Monitor the memory array for further degradation.

Step 3 Replace the power supply.

Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: voltage-problem

mibFaultCode: 191

mibFaultName: fltMemoryArrayVoltageThresholdNonRecoverable

moClass: memory:Array

Type: environmental

fltMemoryUnitIdentityUnestablishable

Fault Code: F0502

Message:

DIMM [location] on server [chassisId]/[slotId] has an invalid FRUDIMM [location] on server [id] has an invalid FRU

Explanation:

This fault typically occurs when a sensor has detected an unsupported DIMM in the server. For example, the model, vendor, or revision is not recognized

Recommended Action:

If you see this fault, take the following action:

Step 1 Verify if the DIMM is supported on the server configuration.

Step 2 If the above action did not resolve the issue, you may have unsupported DIMMs or DIMM configuration in the server. Contact Cisco TAC.

Fault Details:

Severity: warning

Cause: identity-unestablishable

mibFaultCode: 502

mibFaultName: fltMemoryUnitIdentityUnestablishable

moClass: memory:Unit

Type: equipment

Processor-Related Faults

fltProcessorUnitInoperable

Fault Code: F0174

Message

Processor [id] on server [chassisId]/[slotId] operability: [operability]

Explanation

This fault occurs in the event the processor encounters a catastrophic error or has exceeded pre-set thermal/power thresholds.

Recommended Action

If you see this fault, take the following action:

Step 1 In the event that the probable cause being indicated is a thermal problem, check to see if the air flow to the server is not obstructed, and it is adequately ventilated. If possible, check if the heat sink is properly seated on the processor.

Step 2 In the event that the probable cause being indicated is equipment inoperable, please contact Cisco TAC for further instructions.

Step 3 In the event that the probable cause being indicated is a power or voltage problem, it is recommended to see if the issue is resolved with an alternate power supply. If this fails to resolve the issue, please contact Cisco TAC.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 174

mibFaultName: fltProcessorUnitInoperable

moClass: processor:Unit

Type: equipment

fltProcessorUnitThermalNonCritical

Fault Code: F0175

Message:

Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the processor temperature on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following action:

Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: minor

Cause: thermal-problem

mibFaultCode: 175

mibFaultName: fltProcessorUnitThermalNonCritical

moClass: processor:Unit

Type: environmental

fltProcessorUnitThermalThresholdCritical

Fault Code: F0176

Message:

Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the processor temperature on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: thermal-problem

mibFaultCode: 176

mibFaultName: fltProcessorUnitThermalThresholdCritical

moClass: processor:Unit

Type: environmental

fltProcessorUnitThermalThresholdNonRecoverable

Fault Code: F0177

Message:

Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

Explanation:

This fault occurs when the processor temperature on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the server.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: non-recoverable

Cause: thermal-problem

mibFaultCode: 177

mibFaultName: fltProcessorUnitThermalThresholdNonRecoverable

moClass: processor:Unit

Type: environmental

fltProcessorUnitDisabled

Fault Code: F0842

Message:

Processor [id] on server [chassisId]/[slotId] operState: [operState]Processor [id] on server [id] operState: [operState]

Explanation:

This fault occurs in the unlikely event that a processor is disabled.

Recommended Action:

If you see this fault, take the following actions:

Step 1 If this fault occurs , remove and reinsert the server into the chassis.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: info

Cause: equipment-disabled

mibFaultCode: 842

mibFaultName: fltProcessorUnitDisabled

moClass: processor:Unit

Type: environmental

Power Supply-Related Faults

fltEquipmentPsuInoperable

Fault Code: F0374

Message:

[operability]Power supply [id] in server [id] operability: [operability]

Explanation:

This fault typically occurs when the power supply unit is either offline or the input/output voltage is out of range.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Verify that the power cord is properly connected to the PSU and the power source.

Step 2 Verify that the power source is 220 volts.

Step 3 Remove the PSU and reinstall it.

Step 4 Replace the PSU.

Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 374

mibFaultName: fltEquipmentPsuInoperable

moClass: equipment:Psu

Type: equipment

fltEquipmentPsuThermalThresholdNonCritical

Fault Code: F0381

Message:

[thermal]Power supply [id] in server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a PSU module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty PSU modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: minor

Cause: thermal-problem

mibFaultCode: 381

mibFaultName: fltEquipmentPsuThermalThresholdNonCritical

moClass: equipment:Psu

Type: environmental

fltEquipmentPsuThermalThresholdCritical

Fault Code: F0383

Message:

[thermal]Power supply [id] in server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a PSU module has exceeded a critical threshold value. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following action:

Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty PSU modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: thermal-problem

mibFaultCode: 383

mibFaultName: fltEquipmentPsuThermalThresholdCritical

moClass: equipment:Psu

Type: environmental

fltEquipmentPsuMissing

Fault Code: F0378

Message:

[presence]Power supply [id] in server [id] presence: [presence]

Explanation:

This fault typically occurs when the power supply module is either missing or the input power to the server is absent.

Recommended Action:

If you see this fault, take the following action:

Step 1 Check to see if the power supply is connected to a power source.

Step 2 If the PSU is physically present in the slot, remove and then reinsert it.

Step 3 If the PSU is not physically present in the slot, insert a new PSU.

Step 4 If you see this fault, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: equipment-missing

mibFaultCode: 378

mibFaultName: fltEquipmentPsuMissing

moClass: equipment:Psu

Type: equipment

fltEquipmentPsuThermalThresholdNonRecoverable

Fault Code: F0385

Message:

[thermal]Power supply [id] in server [id] temperature: [thermal]

Explanation:

This fault occurs when the temperature of a PSU module has been out of operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action:

If you see this fault, take the following actions:

Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

Step 3 Verify that the air flows are not obstructed.

Step 4 Verify that the site cooling system is operating properly.

Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

Step 6 Replace faulty PSU modules.

Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: thermal-problem

mibFaultCode: 385

mibFaultName: fltEquipmentPsuThermalThresholdNonRecoverable

moClass: equipment:Psu

Type: environmental

fltEquipmentPsuVoltageThresholdCritical

Fault Code: F0389

Message:

[voltage]Power supply [id] in server [id] voltage: [voltage]

Explanation:

This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Remove and reseat the PSU.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: voltage-problem

mibFaultCode: 389

mibFaultName: fltEquipmentPsuVoltageThresholdCritical

moClass: equipment:Psu

Type: environmental

fltEquipmentPsuVoltageThresholdNonRecoverable

Fault Code:F0391

Message:

[voltage]Power supply [id] in server [id] voltage: [voltage]

Explanation:

This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating and PSU hardware may have been damaged as a result or may be at risk of being damaged.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Remove and reseat the PSU.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: voltage-problem

mibFaultCode: 391

mibFaultName: fltEquipmentPsuVoltageThresholdNonRecoverable

moClass: equipment:Psu

Type: environmental

fltEquipmentPsuPerfThresholdNonCritical

Fault Code: F0392

Message:

[perf]Power supply [id] in server [id] output power: [perf]

Explanation:

This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Monitor the PSU status.

Step 2 If possible, remove and reseat the PSU.

Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.

Fault Details:

Severity: minor

Cause: power-problem

mibFaultCode: 392

mibFaultName: fltEquipmentPsuPerfThresholdNonCritical

moClass: equipment:Psu

Type: equipment

fltEquipmentPsuPerfThresholdCritical

Fault Code: F0393

Message

[perf]Power supply [id] in server [id] output power: [perf]

Explanation:

This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Monitor the PSU status.

Step 2 If possible, remove and reseat the PSU.

Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.

Fault Details:

Severity: warning

Cause: power-problem

mibFaultCode: 393

mibFaultName: fltEquipmentPsuPerfThresholdCritical

moClass: equipment:Psu

Type: equipment

fltEquipmentPsuPerfThresholdNonRecoverable

Fault Code:F0394

Message:

[perf] Power supply [id] in server [id] output power: [perf]

Explanation:

This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Monitor the PSU status.

Step 2 If possible, remove and reseat the PSU.

Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.

Fault Details:

Severity: major

Cause: power-problem

mibFaultCode: 394

mibFaultName: fltEquipmentPsuPerfThresholdNonRecoverable

moClass: equipment:Psu

Type: equipment

fltEquipmentPsuIdentity

Fault Code: F0407

Message:

Power supply [id] on chassis [id] has a malformed FRUPower supply [id] on server [id] has a malformed FRU

Explanation:

This fault typically occurs when the FRU information for a power supply unit is corrupted or malformed.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Verify that the vendor specification for the power supply.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: fru-problem

mibFaultCode: 407

mibFaultName: fltEquipmentPsuIdentity

moClass: equipment:Psu

Type: equipment

fltPowerChassisMemberChassisPsuRedundanceFailure

Fault Code: F0743

Message

Chassis [id] was configured for redundancy, but running in a non-redundant configuration.

Explanation

This fault typically occurs when chassis power redundancy has failed.

Recommended Action

If you see this fault, take the following actions:

Step 1 Consider adding more PSUs to the chassis.

Step 2 Replace any non-functional PSUs.

Step 3 If the above actions did not resolve the issue, create a show tech-support file and contact Cisco TAC.

Fault Details

Severity: major

Cause: psu-redundancy-fail

mibFaultCode: 743

mibFaultName: fltPowerChassisMemberChassisPsuRedundanceFailure

moClass: power:ChassisMember

Type: environmental

fltEquipmentPsuPowerThreshold

Fault Code: F0882

Message:

Power supply [id] on chassis [id] has exceeded its power thresholdPower supply [id] on server [id] has exceeded its power threshold.

Explanation:

This fault occurs when a power supply unit is drawing too much current.

Recommended Action:

If you see this fault, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: power-problem

mibFaultCode: 882

mibFaultName: fltEquipmentPsuPowerThreshold

moClass: equipment:Psu

Type: equipment

fltEquipmentPsuInputError

Fault Code: F0883

Message:

Power supply [id] on chassis [id] has disconnected cable or bad input voltagePower supply [id] on server [id] has disconnected cable or bad input voltage.

Explanation:

This fault occurs when a power cable is disconnected or input voltage is incorrect.

Recommended Action:

If you see this fault, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: power-problem

mibFaultCode: 883

mibFaultName: fltEquipmentPsuInputError

moClass: equipment:Psu

Type: equipment

Server-Related Faults

fltComputeBoardPowerError

Fault Code: F0310

Message:

Motherboard of server [chassisId]/[slotId] (service profile: [assignedToDn]) power: [operPower]Motherboard of server [id] (service profile: [assignedToDn]) power: [operPower]

Explanation:

This fault typically occurs when the server power sensors have detected a problem.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Reseat/replace the power supply.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: power-problem

mibFaultCode: 310

mibFaultName: fltComputeBoardPowerError

moClass: compute:Board

Type: environmental

fltComputePhysicalBiosPostTimeout

Fault Code: F0313

Message:

Server [id] (service profile: [assignedToDn]) BIOS failed power-on self testServer [chassisId]/[slotId] (service profile: [assignedToDn]) BIOS failed power-on self test.

Explanation:

This fault typically occurs when the server has encountered a diagnostic failure.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Connect to the CIMC WebUI and record from the KVM where the POST failure has occured.

Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: equipment-inoperable

mibFaultCode: 313

mibFaultName: fltComputePhysicalBiosPostTimeout

moClass: compute:Physical

Type: equipment

fltComputeBoardCmosVoltageThresholdCritical

Fault Code: F0424

Message:

Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]

Explanation:

This fault is raised when the CMOS battery voltage has dropped to lower than the normal operating range. This could impact the clock and other CMOS settings.

Recommended Action:

If you see this fault, replace the battery.

Fault Details:

Severity: critical

Cause: voltage-problem

mibFaultCode: 424

mibFaultName: fltComputeBoardCmosVoltageThresholdCritical

moClass: compute:Board

Type: environmental

fltComputeBoardCmosVoltageThresholdNonRecoverable

Fault Code: F0425

Message:

Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]

Explanation:

This fault is raised when the CMOS battery voltage has dropped quite low and is unlikely to recover. This impacts the clock and other CMOS settings.

Recommended Action:

If you see this fault, replace the battery.

Fault Details:

Severity: major

Cause: voltage-problem

mibFaultCode: 425

mibFaultName: fltComputeBoardCmosVoltageThresholdNonRecoverable

moClass: compute:Board

Type: environmental

fltComputeIOHubThermalNonCritical

Fault Code: F0538

Message:

IO Hub on server [chassisId]/[slotId] temperature: [thermal]

Explanation:

This fault is raised when the IO controller temperature is outside the upper or lower non-critical threshold.

Recommended Action:

If you see this fault, monitor other environmental events related to this server and ensure the temperature ranges are within recommended ranges.

Fault Details:

Severity: minor

Cause: thermal-problem

mibFaultCode: 538

mibFaultName: fltComputeIOHubThermalNonCritical

moClass: compute:IOHub

Type: environmental

fltComputeIOHubThermalThresholdCritical

Fault Code: F0539

Message:

IO Hub on server [chassisId]/[slotId] temperature: [thermal]

Explanation:

This fault is raised when the IO controller temperature is outside the upper or lower critical threshold.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Monitor other environmental events related to the server and ensure the temperature ranges are within recommended ranges.

Step 2 Consider turning off the server for a while if possible.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: thermal-problem

mibFaultCode: 539

mibFaultName: fltComputeIOHubThermalThresholdCritical

moClass: compute:IOHub

Type: environmental

fltComputeIOHubThermalThresholdNonRecoverable

Fault Code: F0540

Message:

IO Hub on server [chassisId]/[slotId] temperature: [thermal]

Explanation:

This fault is raised when the IO controller temperature is outside the recoverable range of operation.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Shut down the server immediately.

Step 2 Create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: thermal-problem

mibFaultCode: 540

mibFaultName: fltComputeIOHubThermalThresholdNonRecoverable

moClass: compute:IOHub

Type: environmental

fltComputePhysicalPostFailure

Fault Code: F0517

Message:

Server [id] POST or diagnostic failureServer [chassisId]/[slotId] POST or diagnostic failure.

Explanation:

This fault typically occurs when the server has encountered a diagnostic failure or an error during POST.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Check the POST result for the server.

Step 2 Reboot the server.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco Technical Support.

Fault Details:

Severity: major

Cause: equipment-problem

mibFaultCode: 517

mibFaultName: fltComputePhysicalPostFailure

moClass: compute:Physical

Type: server

fltComputeBoardPowerFail

Fault Code: F0868

Message:

[power]Motherboard of server [id] power: [power]

Explanation:

This fault typically occurs when the power sensors on a server detect a problem.

Recommended Action:

If you see this fault, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: critical

Cause: power-problem

mibFaultCode: 868

mibFaultName: fltComputeBoardPowerFail

moClass: compute:Board

Type: environmental

fltComputeBoardThermalProblem

Fault Code: F0869

Message:

Motherboard of server [chassisId]/[slotId] : [assignedToDn]) thermal: [thermal]Motherboard of server [id] : [assignedToDn]) thermal: [thermal]

Explanation:

This fault typically occurs when the motherboard thermal sensors on a server detect a problem.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Verify that the server fans are working properly.

Step 2 Wait for 24 hours to see if the problem resolves itself.

Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: thermal-problem

mibFaultCode: 869

mibFaultName: fltComputeBoardThermalProblem

moClass: compute:Board

Type: environmental

fltComputeBoardMotherBoardVoltageUpperThresholdCritical

Fault Code: F0920

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has exceeded upper critical thresholds.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Reseat or replace the power supply.

Step 2 If the issue persists, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: voltage-problem

mibFaultCode: 920

mibFaultName: fltComputeBoardMotherBoardVoltageUpperThresholdCritical

moClass: compute:Board

Type: environmental

fltComputeBoardPowerUsageProblem

Fault Code: F1040

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when the motherboard power consumption exceeds certain threshold limits. When this happens, the power usage sensors on a server detect a problem.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Contact Cisco TAC.

Fault Details:

Severity: warning

Cause: power-problem

mibFaultCode: 1040

mibFaultName: fltComputeBoardPowerUsageProblem

moClass: compute:Board

Type: environmental

fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable

Fault Code: F0918

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has become too high and is unlikely to recover.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Contact Cisco TAC.

Fault Details:

Severity: critical

Cause: voltage-problem

mibFaultCode: 918

mibFaultName: fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable

moClass: compute:Board

Type: environmental

fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable

Fault Code: F0919

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has dropped too low and is unlikely to recover.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Contact Cisco TAC.

Fault Details:

Severity: critical

Cause: voltage-problem

mibFaultCode: 919

mibFaultName: fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable

moClass: compute: Board

Type: environmental

fltComputeBoardMotherBoardVoltageLowerThresholdCritical

Fault Code: F0921

Message:

"sys/rack-unit-1/board"

Explanation:

This fault typically occurs when one or more motherboard input voltages has crossed lower critical thresholds.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Reseat or replace the power supply.

Step 2 If the issue persists, create a tech-support file and contact TAC.

Fault Details:

Severity: major

Cause: voltage-problem

mibFaultCode: 921

mibFaultName: fltComputeBoardMotherBoardVoltageLowerThresholdCritical

moClass: compute: Board

Type: environmental

fltMemoryUnitECCThresholdNonCritical

Fault Code: F2500

Message:

"sys/rack-unit-1/board/memarray-%d/mem-%d"

Explanation:

This fault indicates that the memory DIMM has crossed a non critical threshold of reported ECC errors.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.

Step 2 Monitor the server for temperature/voltage thresholds.

Fault Details:

Severity: minor

Cause: equipment-degraded

mibFaultCode: 2500

mibFaultName: fltMemoryUnitECCThresholdNonCritical

moClass: memory: Unit

Type: equipment

fltMemoryUnitECCThresholdCritical

Fault Code: F2501

Message:

"sys/rack-unit-1/board/memarray-%d/mem-%d"

Explanation:

This fault indicates that the memory DIMM has crossed a critical threshold of reported ECC errors.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.

Step 2 Monitor the server for temperature/voltage thresholds.

Fault Details:

Severity: warning

Cause: equipment-degraded

mibFaultCode: 2501

mibFaultName: fltMemoryUnitECCThresholdCritical

moClass: memory: Unit

Type: equipment

fltMemoryUnitECCThresholdNonRecoverable

Fault Code: F2502

Message:

"sys/rack-unit-1/board/memarray-%d/mem-%d"

Explanation:

This fault indicates that the memory DIMM has crossed a non recoverable threshold of reported ECC errors.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Replace the memory DIMM.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 2502

mibFaultName: fltMemoryUnitECCThresholdNonRecoverable

moClass: memory: Unit

Type: equipment

Storage-Related Faults

fltStorageLocalDiskInoperable

Fault Code: F0181

Message:

Local disk [id] on server [chassisId]/[slotId] operability: [operability]Local disk [id] on server [id] operability: [operability]

Explanation:

This fault occurs when the local disk has become inoperable or has been removed while the server was in use.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Insert the disk in a supported slot.

Step 2 Remove and reinsert the local disk.

Step 3 Replace the disk, if an additional disk is available.

If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 181

mibFaultName: fltStorageLocalDiskInoperable

moClass: storage:LocalDisk

fltStorageRaidBatteryInoperable

Fault Code: F0531

Message:

RAID Battery on server [chassisId]/[slotId] operability: [operability]RAID Battery on server [id] operability: [operability]

Explanation:

This fault occurs when the RAID battery voltage is below the normal operating range.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Replace the RAID battery.

Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 531

mibFaultName: fltStorageRaidBatteryInoperable

moClass: storage:RaidBattery

Type: equipment

fltStorageLocalDiskCopybackFailed

Fault Code: F0978

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/pd-%d"

Explanation:

This fault indicates a physical disk copyback failure. This fault could indicate a physical drive problem or an issue with the RAID configuration.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Replace the physical drive and check to see if the issue is resolved after a rebuild.

Step 2 Reseat or replace the storage controller.

Step 3 Check configuration options for the storage controller in the MegaRAID ROM configuration page.

Fault Details:

Severity: warning

Cause: equipment-offline

mibFaultCode: 978

mibFaultName: fltStorageLocalDiskCopybackFailed

moClass: storage:LocalDisk

Type: equipment

fltStorageRaidBatteryDegraded

Fault Code: F0969

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

Explanation:

This fault indicates a controller battery backup unit failure.

Recommended Action:

If you see this fault, take the following action:

Step 1 Reseat or replace the battery backup unit on the storage controller.

Fault Details:

Severity: warning

Cause: equipment-degraded

mibFaultCode: 969

mibFaultName: fltStorageRaidBatteryDegraded

moClass: storage:RaidBattery

Type: equipment

fltStorageRaidBatteryRelearnAborted

Fault Code: F0970

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

Explanation:

This fault indicates that a controller battery relearn was aborted.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Restart the relearn process for the battery backup unit.

Step 2 Reseat or replace the battery backup unit.

Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.

Fault Details:

Severity: info

Cause: equipment-degraded

mibFaultCode: 970

mibFaultName: fltStorageRaidBatteryRelearnAborted

moClass: storage:RaidBattery

Type: equipment

fltStorageRaidBatteryRelearnFailed

Fault Code: F0971

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

Explanation:

This fault indicates a controller battery relearn failure.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Restart the relearn process for the battery backup unit.

Step 2 Reseat or replace the battery backup unit.

Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.

Fault Details:

Severity: warning

Cause: equipment-degraded

mibFaultCode: 971

mibFaultName: fltStorageRaidBatteryRelearnFailed

moClass: storage:RaidBattery

Type: equipment

fltStorageVirtualDriveConsistencyCheckFailed

Fault Code: F0982

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a consistency check failure with the virtual drive.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Initiate a consistency check on the virtual drive.

Step 2 Replace any faulty physical drives.

Fault Details:

Severity: warning

Cause: equipment-degraded

mibFaultCode: 982

mibFaultName: fltStorageVirtualDriveConsistencyCheckFailed

moClass: storage:VirtualDrive

Type: equipment

fltStorageVirtualDriveDegraded

Fault Code: F1008

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a recoverable error with the virtual drive.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Initiate a consistency check on the virtual drive.

Step 2 Replace any faulty physical drives.

Fault Details:

Severity: warning

Cause: equipment-degraded

mibFaultCode: 1008

mibFaultName: fltStorageVirtualDriveDegraded

moClass: storage:VirtualDrive

Type: equipment

fltStorageVirtualDriveInoperable

Fault Code: F1007

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a non-recoverable error with the virtual drive.

Recommended Action:

If you see this fault, take the following actions:

Step 1 If the data on the drive is accessible, back up and recreate the virtual drive.

Step 2 Replace any faulty physical drives.

Step 3 Check for controller errors in the MegaRAID ROM page logs.

Fault Details:

Severity: major

Cause: equipment-inoperable

mibFaultCode: 1007

mibFaultName: fltStorageVirtualDriveInoperable

moClass: storage:storage:VirtualDrive

Type: equipment

fltStorageVirtualDriveReconstructionFailed

Fault Code: F0981

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

Explanation:

This fault indicates a failure in the reconstruction process of the virtual drive.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Restart the reconstruction process.

Fault Details:

Severity: warning

Cause: equipment-degraded

mibFaultCode: 981

mibFaultName: fltStorageVirtualDriveReconstructionFailed

moClass: storage:VirtualDrive

Type: equipment

fltStorageControllerInoperable

Fault Code: F0976

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d"

Explanation:

This fault indicates a non-recoverable storage controller failure. This happens when the storage system cannot contact the controller for a period of time, after which it gives up, and raises this fault.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Reseat or replace the storage controller.

Fault Details:

Severity: warning

Cause: equipment-inoperable

mibFaultCode: 976

mibFaultName: fltStorageControllerInoperable

moClass: storage:Controller

Type: equipment

fltStorageControllerPatrolReadFailed

Fault Code: F1003

Message:

"sys/rack-unit-1/board/storage-%s-ctlr-%d"

Explanation:

This fault indicates that the review of the storage system for potential physical disk errors has failed.

Recommended Action:

If you see this fault, take the following actions:

Step 1 Initiate a consistency check on the virtual drive.

Step 2 Replace any faulty physical drives.

Fault Details:

Severity: warning

Cause: equipment-inoperable

mibFaultCode: 1003

mibFaultName: fltStorageControllerPatrolReadFailed

moClass: storage:Controller

Type: equipment

System Event Log-Related Faults

fltSysdebugMEpLogMEpLogVeryLow

Fault Code: F0461

Message:

Log capacity on Management Controller on server [id] is [capacity]

Explanation

This fault typically occurs because Cisco Integrated Management Controller (CIMC) has detected that the system event log (SEL) on the server is almost full. The available capacity in the log is very low. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.

Recommended Action

If you see this fault, you can clear the SEL, if desired.

Fault Details:

Severity: info

Cause: log-capacity

mibFaultCode: 461

mibFaultName: fltSysdebugMEpLogMEpLogVeryLow

moClass: sysdebug:MEpLog

Type: operational

fltSysdebugMEpLogMEpLogFull

Fault Code: F0462

Message:

Log capacity on Management Controller on server [id] is [capacity]

Explanation

This fault typically occurs because Cisco CIMC could not transfer the SEL file to the location specified in the SEL policy. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.

Recommended Action

If you see this fault, take the following actions:

Step 1 Verify the configuration of the SEL policy to ensure that the location, user, and password provided are

correct.

Step 2 If you do want to transfer and clear the SEL and the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: info

Cause: log-capacity

mibFaultCode: 462

mibFaultName: fltSysdebugMEpLogMEpLogFull

moClass: sysdebug:MEpLog

Type: operational

Bias-Free Language

Results

Chapter: Faults Generated in CIMC

Faults Generated in CIMC

Chassis-Related Faults

fltEquipmentChassisThermalThresholdCritical

fltEquipmentChassisThermalThresholdNonCritical

fltEquipmentChassisThermalThresholdNonRecoverable

Fan-Related Faults

fltEquipmentFanDegraded

fltEquipmentFanInoperable

fltEquipmentFanModuleMissing

fltEquipmentFanPerfThresholdNonCritical

fltEquipmentFanPerfThresholdCritical

fltEquipmentFanPerfThresholdNonRecoverable

fltEquipmentFanMissing

I/O Module-Related Faults

fltEquipmentIOCardRemoved

fltEquipmentIOCardThermalProblem

fltEquipmentIOCardThermalThresholdNonCritical

fltEquipmentIOCardThermalThresholdCritical

fltEquipmentIOCardThermalThresholdNonRecoverable

Memory-Related Faults

fltMemoryUnitDegraded

fltMemoryUnitInoperable

fltMemoryUnitThermalThresholdNonCritical

fltMemoryUnitThermalThresholdCritical

fltMemoryUnitThermalThresholdNonRecoverable

fltMemoryArrayVoltageThresholdCritical

fltMemoryArrayVoltageThresholdNonRecoverable

fltMemoryUnitIdentityUnestablishable

Processor-Related Faults

fltProcessorUnitInoperable

fltProcessorUnitThermalNonCritical

fltProcessorUnitThermalThresholdCritical

fltProcessorUnitThermalThresholdNonRecoverable

fltProcessorUnitDisabled

Power Supply-Related Faults

fltEquipmentPsuInoperable

fltEquipmentPsuThermalThresholdNonCritical

fltEquipmentPsuThermalThresholdCritical

fltEquipmentPsuMissing

fltEquipmentPsuThermalThresholdNonRecoverable

fltEquipmentPsuVoltageThresholdCritical

fltEquipmentPsuVoltageThresholdNonRecoverable

fltEquipmentPsuPerfThresholdNonCritical

fltEquipmentPsuPerfThresholdCritical

fltEquipmentPsuPerfThresholdNonRecoverable

fltEquipmentPsuIdentity

fltPowerChassisMemberChassisPsuRedundanceFailure

fltEquipmentPsuPowerThreshold

fltEquipmentPsuInputError

Server-Related Faults

fltComputeBoardPowerError

fltComputePhysicalBiosPostTimeout

fltComputeBoardCmosVoltageThresholdCritical

fltComputeBoardCmosVoltageThresholdNonRecoverable

fltComputeIOHubThermalNonCritical

fltComputeIOHubThermalThresholdCritical

fltComputeIOHubThermalThresholdNonRecoverable

fltComputePhysicalPostFailure

fltComputeBoardPowerFail

fltComputeBoardThermalProblem

fltComputeBoardMotherBoardVoltageUpperThresholdCritical

fltComputeBoardPowerUsageProblem

fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable

fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable

fltComputeBoardMotherBoardVoltageLowerThresholdCritical

fltMemoryUnitECCThresholdNonCritical

fltMemoryUnitECCThresholdCritical

fltMemoryUnitECCThresholdNonRecoverable

Storage-Related Faults

fltStorageLocalDiskInoperable

fltStorageRaidBatteryInoperable

fltStorageLocalDiskCopybackFailed

fltStorageRaidBatteryDegraded

fltStorageRaidBatteryRelearnAborted

fltStorageRaidBatteryRelearnFailed

fltStorageVirtualDriveConsistencyCheckFailed