• Fan-Related Faults
  • I/O Module-Related Faults
  • Memory-Related Faults
  • Processor-Related Faults
  • Power Supply-Related Faults
  • Server-Related Faults
  • Storage-Related Faults
  • System Event Log-Related Faults

  • Faults Generated in CIMC


    This chapter provides information about the faults that may be raised in and reported in CIMC Web UI.

    This chapter includes the following sections:

    Chassis-Related Faults

    Fan-Related Faults

    I/O Module-Related Faults

    Memory-Related Faults

    Processor-Related Faults

    Power Supply-Related Faults

    Server-Related Faults

    Storage-Related Faults

    System Event Log-Related Faults

    Chassis-Related Faults

    fltEquipmentChassisThermalThresholdCritical

    Fault Code: F0409

    Message:

    Thermal condition on chassis [id] cause: [thermalStateQualifier]

    Explanation;

    This fault occurs under the following condition:

    If a component within a chassis is operating outside the safe thermal operating range.

    Recommended Action;

    If you see this fault, take the following actions:


    Step 1 Check the temperature readings and IOM and ensure it is within the recommended thermal safe operating range.

    Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.

    Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

    Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: thermal-problem
    
    mibFaultCode: 409
    
    mibFaultName: fltEquipmentChassisThermalThresholdCritical
    
    moClass: equipment:Chassis
    
    Type: environmental
    

    fltEquipmentChassisThermalThresholdNonCritical

    Fault Code; F0410

    Message:

    Thermal condition on chassis [id] cause: [thermalStateQualifier]

    Explanation:

    This fault occurs under the following condition:

    If a component within a chassis is operating outside the safe thermal operating range.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.

    Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.

    Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

    Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.


    Fault Details:

    Severity: minor
    
    Cause: thermal-problem
    
    mibFaultCode: 410
    
    mibFaultName: fltEquipmentChassisThermalThresholdNonCritical
    
    moClass: equipment:Chassis
    
    Type: environmental
    

    fltEquipmentChassisThermalThresholdNonRecoverable

    Fault Code: F0411

    Message:

    Thermal condition on chassis [id] cause: [thermalStateQualifier]

    Explanation:

    This fault occurs under the following condition:

    If a component within a chassis is operating outside the safe thermal operating range.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.

    Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.

    Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.

    Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: thermal-problem
    
    mibFaultCode: 411
    
    mibFaultName: fltEquipmentChassisThermalThresholdNonRecoverable
    
    moClass: equipment:Chassis
    
    Type: environmental
    

    Fan-Related Faults

    fltEquipmentFanDegraded

    Fault Code: F0371

    Message:

    Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]

    Explanation:

    This fault occurs when one or more fans in a fan module are not operational, but at least one fan is operational.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the fan module.

    Step 2 Review the Cisco UCS Site Preparation Guide and ensure the fan module has adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 Replace the faulty fan modules.

    Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: minor
    
    Cause: equipment-degraded
    
    mibFaultCode: 371
    
    mibFaultName: fltEquipmentFanDegraded
    
    moClass: equipment:Fan
    
    Type: equipment
    

    fltEquipmentFanInoperable

    Fault Code: F0373

    Message:

    Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]

    Explanation:

    This fault occurs if a fan is not operational.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Remove fan module and re-install the fan module again. Remove only one fan module at a time.

    Step 2 Replace fan module with a different fan module

    Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 373
    
    mibFaultName: fltEquipmentFanInoperable
    
    moClass: equipment:Fan
    
    Type: equipment
    

    fltEquipmentFanModuleMissing

    Fault Code: F0377

    Message:

    [presence]Fan module [tray]-[id] in server [id] presence:

    Explanation:

    This fault occurs if a fan Module slot is not equipped or removed from its slot.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 If the reported slot is empty, insert a fan module into the slot.

    Step 2 If the reported slot contains a fan module, remove and reinsert the fan module.

    Step 3 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: equipment-missing
    
    mibFaultCode: 377
    
    mibFaultName: fltEquipmentFanModuleMissing
    
    moClass: equipment:FanModule
    
    Type: equipment
    

    fltEquipmentFanPerfThresholdNonCritical

    Fault Code: F0395

    Message:

    [perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

    Explanation:

    This fault occurs when the fan speed reading from the fan controller does not match the desired fan speed and is outside of the normal operating range. This can indicate a problem with a fan or with the reading from the fan controller.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Monitor the fan status.

    Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.

    Step 3 Replace the fan module.

    Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details

    Severity: minor
    
    Cause: performance-problem
    
    mibFaultCode: 395
    
    mibFaultName: fltEquipmentFanPerfThresholdNonCritical
    
    moClass: equipment
    

    fltEquipmentFanPerfThresholdCritical

    Fault Code: F0396

    Message:

    [perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

    Explanation:

    This fault occurs when the fan speed read from the fan controller does not match the desired fan speed and has exceeded the critical threshold and is in risk of failure. This can indicate a problem with a fan or with the reading from the fan controller.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Monitor the fan status.

    Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.

    Step 3 If the above actions did not resolve the issue, create a tech-support file for the chassis and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: performance-problem
    
    mibFaultCode: 396
    
    mibFaultName: fltEquipmentFanPerfThresholdCritical
    
    moClass: equipment:
    

    fltEquipmentFanPerfThresholdNonRecoverable

    Fault Code: F0397

    Message:

    [perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]

    Explanation:

    This fault occurs when the fan speed read from the fan controller has far exceeded the desired fan speed. It frequently indicates that the fan has failed.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Replace the fan.

    Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: performance-problem
    
    mibFaultCode: 397
    
    mibFaultName: fltEquipmentFanPerfThresholdNonRecoverable
    
    moClass: equipment:Fan
    
    Type: equipment
    

    fltEquipmentFanMissing

    Fault Code: F0434

    Message:

    [presence]Fan [id] in Fan Module [tray]-[id] under server [id] presence: [presence]

    Explanation:

    This fault occurs in the unlikely event that a fan in a fan module cannot be detected.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Insert/reinsert the fan module in the slot that is reporting the issue.

    Step 2 Replace the fan module with a different fan module, if available.

    Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: equipment-missing
    
    mibFaultCode: 434
    
    mibFaultName: fltEquipmentFanMissing
    
    moClass: equipment:Fan
    
    Type: equipment
    

    I/O Module-Related Faults

    fltEquipmentIOCardRemoved

    Fault Code: F0376

    Message:

    [side] IOM [chassisId]/[id] is removed.

    Explanation:

    This fault typically occurs because an I/O module is removed from the chassis. For a standalone configuration, the chassis associated with the I/O module loses network connectivity. This is a critical fault because it can result in the loss of network connectivity and disrupt data traffic through the I/O module.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Re-seat/re-insert the I/O module.

    Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: equipment-removed
    
    mibFaultCode: 376
    
    mibFaultName: fltEquipmentIOCardRemoved
    
    moClass: equipment:IOCard
    
    Type: equipment
    

    fltEquipmentIOCardThermalProblem

    Fault Code:F0379

    Message:

    [side] IOM [chassisId]/[id] operState: [operState]

    Explanation:

    This fault occurs when there is a thermal problem on an I/O module. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the I/O modules have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows on the Cisco UCS chassis are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 Replace faulty I/O modules.

    Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: thermal-problem
    
    mibFaultCode: 379
    
    mibFaultName: fltEquipmentIOCardThermalProblem
    
    moClass: equipment:IOCard
    
    Type: environmental
    

    fltEquipmentIOCardThermalThresholdNonCritical

    Fault Code: F0729

    Message:

    [side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of an I/O module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

    Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.

    Step 3 Verify that the site cooling system is operating properly.

    Step 4 Power off unused rack servers.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: minor
    
    Cause: thermal-problem
    
    mibFaultCode: 729
    
    mibFaultName: fltEquipmentIOCardThermalThresholdNonCritical
    
    moClass: equipment:IOCard
    
    Type: environmental
    

    fltEquipmentIOCardThermalThresholdCritical

    Fault Code: F0730

    Message:

    [side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of an I/O module has exceeded a critical threshold value. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

    Step 2 Verify that the site cooling system is operating properly.

    Step 3 Power off unused rack servers.

    Step 4 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: thermal-problem
    
    mibFaultCode: 730
    
    mibFaultName: fltEquipmentIOCardThermalThresholdCritical
    
    moClass: equipment:IOCard
    
    Type: environmental
    

    fltEquipmentIOCardThermalThresholdNonRecoverable

    Fault Code: F0731

    Message:

    [side] IOM [chassisId]/[id] temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of an I/O module has been out of the operating range, and the

    issue is not recoverable. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the I/O module.

    Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.

    Step 3 Verify that the site cooling system is operating properly.

    Step 4 Power off unused rack servers.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: thermal-problem
    
    mibFaultCode: 731
    
    mibFaultName: fltEquipmentIOCardThermalThresholdNonRecoverable
    
    moClass: equipment:IOCard
    
    Type: environmental
    

    Memory-Related Faults

    fltMemoryUnitDegraded

    Fault Code: F0184

    Message:

    DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]

    operability: [operability]

    Explanation:

    This fault occurs when a DIMM is in a degraded operability state. This state typically occurs when an excessive number of correctable ECC errors are reported on the DIMM by the server BIOS.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Monitor the DIMM for further ECC errors. If the high number of errors persists, there is a high possibility of the DIMM becoming inoperable.

    Step 2 If the DIMM becomes inoperable, replace the DIMM.

    Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: equipment-degraded
    
    mibFaultCode: 184
    
    mibFaultName: fltMemoryUnitDegraded
    
    moClass: memory:Unit
    
    Type: equipment
    

    fltMemoryUnitInoperable

    Fault Code:F0185

    Message:

    DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]

    operability: [operability]

    Explanation:

    This fault typically occurs because an above threshold number of correctable or uncorrectable errors has occurred on a DIMM. The DIMM may be inoperable.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.

    Step 2 If necessary, replace the DIMM.

    Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 185
    
    mibFaultName: fltMemoryUnitInoperable
    
    moClass: memory:Unit
    

    fltMemoryUnitThermalThresholdNonCritical

    Fault Code:F0186

    Message:

    DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]

    temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of a memory unit on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. Inaddition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the server.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: minor
    
    Cause: thermal-problem
    
    mibFaultCode: 186
    
    mibFaultName: fltMemoryUnitThermalThresholdNonCritical
    
    moClass: memory:Unit
    
    Type: environmental
    

    fltMemoryUnitThermalThresholdCritical

    Fault Code:F0187

    Message:

    DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]

    temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of a memory unit on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the server.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.

    Fault Details:

    Severity: warning
    
    Cause: thermal-problem
    
    mibFaultCode: 187
    
    mibFaultName: fltMemoryUnitThermalThresholdCritical
    
    moClass: memory:Unit
    
    Type: environmental
    

    fltMemoryUnitThermalThresholdNonRecoverable

    Fault Code:F0188

    Message:

    DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id] temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of a memory unit on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the server.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: thermal-problem
    
    mibFaultCode: 188
    
    mibFaultName: fltMemoryUnitThermalThresholdNonRecoverable
    
    moClass: memory:Unit
    
    Type: environmental
    

    fltMemoryArrayVoltageThresholdCritical

    Fault Code:F0190

    Message:

    Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]

    Explanation:

    This fault occurs when the memory array voltage exceeds the specified hardware voltage rating.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 If the SEL is enabled, look at the SEL statistics on the DIMM to determine which threshold was crossed.

    Step 2 Monitor the memory array for further degradation.

    Step 3 Replace the power supply.

    Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: voltage-problem
    
    mibFaultCode: 190
    
    mibFaultName: fltMemoryArrayVoltageThresholdCritical
    
    moClass: memory:Array
    

    fltMemoryArrayVoltageThresholdNonRecoverable

    Fault Code: F0191

    Message:

    Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]

    Explanation:

    This fault occurs when the memory array voltage exceeded the specified hardware voltage rating and potentially memory hardware may be in damage or jeopardy.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.

    Step 2 Monitor the memory array for further degradation.

    Step 3 Replace the power supply.

    Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: voltage-problem
    
    mibFaultCode: 191
    
    mibFaultName: fltMemoryArrayVoltageThresholdNonRecoverable
    
    moClass: memory:Array
    
    Type: environmental
    

    fltMemoryUnitIdentityUnestablishable

    Fault Code: F0502

    Message:

    DIMM [location] on server [chassisId]/[slotId] has an invalid FRUDIMM [location] on server [id] has an invalid FRU

    Explanation:

    This fault typically occurs when a sensor has detected an unsupported DIMM in the server. For example, the model, vendor, or revision is not recognized

    Recommended Action:

    If you see this fault, take the following action:


    Step 1 Verify if the DIMM is supported on the server configuration.

    Step 2 If the above action did not resolve the issue, you may have unsupported DIMMs or DIMM configuration in the server. Contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: identity-unestablishable
    
    mibFaultCode: 502
    
    mibFaultName: fltMemoryUnitIdentityUnestablishable
    
    moClass: memory:Unit
    
    Type: equipment
    

    Processor-Related Faults

    fltProcessorUnitInoperable

    Fault Code: F0174

    Message

    Processor [id] on server [chassisId]/[slotId] operability: [operability]

    Explanation

    This fault occurs in the event the processor encounters a catastrophic error or has exceeded pre-set thermal/power thresholds.

    Recommended Action

    If you see this fault, take the following action:


    Step 1 In the event that the probable cause being indicated is a thermal problem, check to see if the air flow to the server is not obstructed, and it is adequately ventilated. If possible, check if the heat sink is properly seated on the processor.

    Step 2 In the event that the probable cause being indicated is equipment inoperable, please contact Cisco TAC for further instructions.

    Step 3 In the event that the probable cause being indicated is a power or voltage problem, it is recommended to see if the issue is resolved with an alternate power supply. If this fails to resolve the issue, please contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 174
    
    mibFaultName: fltProcessorUnitInoperable
    
    moClass: processor:Unit
    
    Type: equipment
    

    fltProcessorUnitThermalNonCritical

    Fault Code: F0175

    Message:

    Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

    Explanation:

    This fault occurs when the processor temperature on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

    Recommended Action:

    If you see this fault, take the following action:


    Step 1 Review the product specifications to determine the temperature operating range of the server.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: minor
    
    Cause: thermal-problem
    
    mibFaultCode: 175
    
    mibFaultName: fltProcessorUnitThermalNonCritical
    
    moClass: processor:Unit
    
    Type: environmental
    

    fltProcessorUnitThermalThresholdCritical

    Fault Code: F0176

    Message:

    Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

    Explanation:

    This fault occurs when the processor temperature on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the server.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: thermal-problem
    
    mibFaultCode: 176
    
    mibFaultName: fltProcessorUnitThermalThresholdCritical
    
    moClass: processor:Unit
    
    Type: environmental
    

    fltProcessorUnitThermalThresholdNonRecoverable

    Fault Code: F0177

    Message:

    Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]

    Explanation:

    This fault occurs when the processor temperature on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the server.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: non-recoverable
    
    Cause: thermal-problem
    
    mibFaultCode: 177
    
    mibFaultName: fltProcessorUnitThermalThresholdNonRecoverable
    
    moClass: processor:Unit
    
    Type: environmental
    

    fltProcessorUnitDisabled

    Fault Code: F0842

    Message:

    Processor [id] on server [chassisId]/[slotId] operState: [operState]Processor [id] on server [id] operState: [operState]

    Explanation:

    This fault occurs in the unlikely event that a processor is disabled.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 If this fault occurs , remove and reinsert the server into the chassis.

    Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: info
    
    Cause: equipment-disabled
    
    mibFaultCode: 842
    
    mibFaultName: fltProcessorUnitDisabled
    
    moClass: processor:Unit
    
    Type: environmental
    

    Power Supply-Related Faults

    fltEquipmentPsuInoperable

    Fault Code: F0374

    Message:

    [operability]Power supply [id] in server [id] operability: [operability]

    Explanation:

    This fault typically occurs when the power supply unit is either offline or the input/output voltage is out of range.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Verify that the power cord is properly connected to the PSU and the power source.

    Step 2 Verify that the power source is 220 volts.

    Step 3 Remove the PSU and reinstall it.

    Step 4 Replace the PSU.

    Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 374
    
    mibFaultName: fltEquipmentPsuInoperable
    
    moClass: equipment:Psu
    
    Type: equipment
    

    fltEquipmentPsuThermalThresholdNonCritical

    Fault Code: F0381

    Message:

    [thermal]Power supply [id] in server [id] temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of a PSU module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 Replace faulty PSU modules.

    Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: minor
    
    Cause: thermal-problem
    
    mibFaultCode: 381
    
    mibFaultName: fltEquipmentPsuThermalThresholdNonCritical
    
    moClass: equipment:Psu
    
    Type: environmental
    

    fltEquipmentPsuThermalThresholdCritical

    Fault Code: F0383

    Message:

    [thermal]Power supply [id] in server [id] temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of a PSU module has exceeded a critical threshold value. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    Recommended Action:

    If you see this fault, take the following action:


    Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 Replace faulty PSU modules.

    Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: thermal-problem
    
    mibFaultCode: 383
    
    mibFaultName: fltEquipmentPsuThermalThresholdCritical
    
    moClass: equipment:Psu
    
    Type: environmental
    

    fltEquipmentPsuMissing

    Fault Code: F0378

    Message:

    [presence]Power supply [id] in server [id] presence: [presence]

    Explanation:

    This fault typically occurs when the power supply module is either missing or the input power to the server is absent.

    Recommended Action:

    If you see this fault, take the following action:


    Step 1 Check to see if the power supply is connected to a power source.

    Step 2 If the PSU is physically present in the slot, remove and then reinsert it.

    Step 3 If the PSU is not physically present in the slot, insert a new PSU.

    Step 4 If you see this fault, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: equipment-missing
    
    mibFaultCode: 378
    
    mibFaultName: fltEquipmentPsuMissing
    
    moClass: equipment:Psu
    
    Type: equipment
    

    fltEquipmentPsuThermalThresholdNonRecoverable

    Fault Code: F0385

    Message:

    [thermal]Power supply [id] in server [id] temperature: [thermal]

    Explanation:

    This fault occurs when the temperature of a PSU module has been out of operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:

    Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

    Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Review the product specifications to determine the temperature operating range of the PSU module.

    Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.

    Step 3 Verify that the air flows are not obstructed.

    Step 4 Verify that the site cooling system is operating properly.

    Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

    Step 6 Replace faulty PSU modules.

    Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: thermal-problem
    
    mibFaultCode: 385
    
    mibFaultName: fltEquipmentPsuThermalThresholdNonRecoverable
    
    moClass: equipment:Psu
    
    Type: environmental
    

    fltEquipmentPsuVoltageThresholdCritical

    Fault Code: F0389

    Message:

    [voltage]Power supply [id] in server [id] voltage: [voltage]

    Explanation:

    This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Remove and reseat the PSU.

    Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: voltage-problem
    
    mibFaultCode: 389
    
    mibFaultName: fltEquipmentPsuVoltageThresholdCritical
    
    moClass: equipment:Psu
    
    Type: environmental
    

    fltEquipmentPsuVoltageThresholdNonRecoverable

    Fault Code:F0391

    Message:

    [voltage]Power supply [id] in server [id] voltage: [voltage]

    Explanation:

    This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating and PSU hardware may have been damaged as a result or may be at risk of being damaged.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Remove and reseat the PSU.

    Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: voltage-problem
    
    mibFaultCode: 391
    
    mibFaultName: fltEquipmentPsuVoltageThresholdNonRecoverable
    
    moClass: equipment:Psu
    
    Type: environmental
    

    fltEquipmentPsuPerfThresholdNonCritical

    Fault Code: F0392

    Message:

    [perf]Power supply [id] in server [id] output power: [perf]

    Explanation:

    This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Monitor the PSU status.

    Step 2 If possible, remove and reseat the PSU.

    Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.


    Fault Details:

    Severity: minor
    
    Cause: power-problem
    
    mibFaultCode: 392
    
    mibFaultName: fltEquipmentPsuPerfThresholdNonCritical
    
    moClass: equipment:Psu
    
    Type: equipment
    

    fltEquipmentPsuPerfThresholdCritical

    Fault Code: F0393

    Message

    [perf]Power supply [id] in server [id] output power: [perf]

    Explanation:

    This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Monitor the PSU status.

    Step 2 If possible, remove and reseat the PSU.

    Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.


    Fault Details:

    Severity: warning
    
    Cause: power-problem
    
    mibFaultCode: 393
    
    mibFaultName: fltEquipmentPsuPerfThresholdCritical
    
    moClass: equipment:Psu
    
    Type: equipment
    

    fltEquipmentPsuPerfThresholdNonRecoverable

    Fault Code:F0394

    Message:

    [perf] Power supply [id] in server [id] output power: [perf]

    Explanation:

    This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Monitor the PSU status.

    Step 2 If possible, remove and reseat the PSU.

    Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: power-problem
    
    mibFaultCode: 394
    
    mibFaultName: fltEquipmentPsuPerfThresholdNonRecoverable
    
    moClass: equipment:Psu
    
    Type: equipment
    

    fltEquipmentPsuIdentity

    Fault Code: F0407

    Message:

    Power supply [id] on chassis [id] has a malformed FRUPower supply [id] on server [id] has a malformed FRU

    Explanation:

    This fault typically occurs when the FRU information for a power supply unit is corrupted or malformed.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Verify that the vendor specification for the power supply.

    Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: fru-problem
    
    mibFaultCode: 407
    
    mibFaultName: fltEquipmentPsuIdentity
    
    moClass: equipment:Psu
    
    Type: equipment
    

    fltPowerChassisMemberChassisPsuRedundanceFailure

    Fault Code: F0743

    Message

    Chassis [id] was configured for redundancy, but running in a non-redundant configuration.

    Explanation

    This fault typically occurs when chassis power redundancy has failed.

    Recommended Action

    If you see this fault, take the following actions:


    Step 1 Consider adding more PSUs to the chassis.

    Step 2 Replace any non-functional PSUs.

    Step 3 If the above actions did not resolve the issue, create a show tech-support file and contact Cisco TAC.


    Fault Details

    Severity: major
    
    Cause: psu-redundancy-fail
    
    mibFaultCode: 743
    
    mibFaultName: fltPowerChassisMemberChassisPsuRedundanceFailure
    
    moClass: power:ChassisMember
    
    Type: environmental
    

    fltEquipmentPsuPowerThreshold

    Fault Code: F0882

    Message:

    Power supply [id] on chassis [id] has exceeded its power thresholdPower supply [id] on server [id] has exceeded its power threshold.

    Explanation:

    This fault occurs when a power supply unit is drawing too much current.

    Recommended Action:

    If you see this fault, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: power-problem
    
    mibFaultCode: 882
    
    mibFaultName: fltEquipmentPsuPowerThreshold
    
    moClass: equipment:Psu
    
    Type: equipment
    

    fltEquipmentPsuInputError

    Fault Code: F0883

    Message:

    Power supply [id] on chassis [id] has disconnected cable or bad input voltagePower supply [id] on server [id] has disconnected cable or bad input voltage.

    Explanation:

    This fault occurs when a power cable is disconnected or input voltage is incorrect.

    Recommended Action:

    If you see this fault, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: power-problem
    
    mibFaultCode: 883
    
    mibFaultName: fltEquipmentPsuInputError
    
    moClass: equipment:Psu
    
    Type: equipment
    

    Server-Related Faults

    fltComputeBoardPowerError

    Fault Code: F0310

    Message:

    Motherboard of server [chassisId]/[slotId] (service profile: [assignedToDn]) power: [operPower]Motherboard of server [id] (service profile: [assignedToDn]) power: [operPower]

    Explanation:

    This fault typically occurs when the server power sensors have detected a problem.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Reseat/replace the power supply.

    Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: power-problem
    
    mibFaultCode: 310
    
    mibFaultName: fltComputeBoardPowerError
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputePhysicalBiosPostTimeout

    Fault Code: F0313

    Message:

    Server [id] (service profile: [assignedToDn]) BIOS failed power-on self testServer [chassisId]/[slotId] (service profile: [assignedToDn]) BIOS failed power-on self test.

    Explanation:

    This fault typically occurs when the server has encountered a diagnostic failure.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Connect to the CIMC WebUI and record from the KVM where the POST failure has occured.

    Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: equipment-inoperable
    
    mibFaultCode: 313
    
    mibFaultName: fltComputePhysicalBiosPostTimeout
    
    moClass: compute:Physical
    
    Type: equipment
    

    fltComputeBoardCmosVoltageThresholdCritical

    Fault Code: F0424

    Message:

    Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]

    Explanation:

    This fault is raised when the CMOS battery voltage has dropped to lower than the normal operating range. This could impact the clock and other CMOS settings.

    Recommended Action:

    If you see this fault, replace the battery.


    Fault Details:

    Severity: critical
    
    Cause: voltage-problem
    
    mibFaultCode: 424
    
    mibFaultName: fltComputeBoardCmosVoltageThresholdCritical
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputeBoardCmosVoltageThresholdNonRecoverable

    Fault Code: F0425

    Message:

    Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]

    Explanation:

    This fault is raised when the CMOS battery voltage has dropped quite low and is unlikely to recover. This impacts the clock and other CMOS settings.

    Recommended Action:

    If you see this fault, replace the battery.


    Fault Details:

    Severity: major
    
    Cause: voltage-problem
    
    mibFaultCode: 425
    
    mibFaultName: fltComputeBoardCmosVoltageThresholdNonRecoverable
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputeIOHubThermalNonCritical

    Fault Code: F0538
    Message:

    IO Hub on server [chassisId]/[slotId] temperature: [thermal]

    Explanation:

    This fault is raised when the IO controller temperature is outside the upper or lower non-critical threshold.

    Recommended Action:

    If you see this fault, monitor other environmental events related to this server and ensure the temperature ranges are within recommended ranges.


    Fault Details:
    Severity: minor
    
    Cause: thermal-problem
    
    mibFaultCode: 538
    
    mibFaultName: fltComputeIOHubThermalNonCritical
    
    moClass: compute:IOHub
    
    Type: environmental
    

    fltComputeIOHubThermalThresholdCritical

    Fault Code: F0539

    Message:

    IO Hub on server [chassisId]/[slotId] temperature: [thermal]

    Explanation:

    This fault is raised when the IO controller temperature is outside the upper or lower critical threshold.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Monitor other environmental events related to the server and ensure the temperature ranges are within recommended ranges.

    Step 2 Consider turning off the server for a while if possible.

    Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: thermal-problem
    
    mibFaultCode: 539
    
    mibFaultName: fltComputeIOHubThermalThresholdCritical
    
    moClass: compute:IOHub
    
    Type: environmental
    

    fltComputeIOHubThermalThresholdNonRecoverable

    Fault Code: F0540

    Message:

    IO Hub on server [chassisId]/[slotId] temperature: [thermal]

    Explanation:

    This fault is raised when the IO controller temperature is outside the recoverable range of operation.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Shut down the server immediately.

    Step 2 Create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: thermal-problem
    
    mibFaultCode: 540
    
    mibFaultName: fltComputeIOHubThermalThresholdNonRecoverable
    
    moClass: compute:IOHub
    
    Type: environmental
    

    fltComputePhysicalPostFailure

    Fault Code: F0517

    Message:

    Server [id] POST or diagnostic failureServer [chassisId]/[slotId] POST or diagnostic failure.

    Explanation:

    This fault typically occurs when the server has encountered a diagnostic failure or an error during POST.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Check the POST result for the server.

    Step 2 Reboot the server.

    Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco Technical Support.


    Fault Details:

    Severity: major
    
    Cause: equipment-problem
    
    mibFaultCode: 517
    
    mibFaultName: fltComputePhysicalPostFailure
    
    moClass: compute:Physical
    
    Type: server
    

    fltComputeBoardPowerFail

    Fault Code: F0868

    Message:

    [power]Motherboard of server [id] power: [power]

    Explanation:

    This fault typically occurs when the power sensors on a server detect a problem.

    Recommended Action:

    If you see this fault, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: power-problem
    
    mibFaultCode: 868
    
    mibFaultName: fltComputeBoardPowerFail
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputeBoardThermalProblem

    Fault Code: F0869

    Message:

    Motherboard of server [chassisId]/[slotId] : [assignedToDn]) thermal: [thermal]Motherboard of server [id] : [assignedToDn]) thermal: [thermal]

    Explanation:

    This fault typically occurs when the motherboard thermal sensors on a server detect a problem.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Verify that the server fans are working properly.

    Step 2 Wait for 24 hours to see if the problem resolves itself.

    Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: thermal-problem
    
    mibFaultCode: 869
    
    mibFaultName: fltComputeBoardThermalProblem
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputeBoardMotherBoardVoltageUpperThresholdCritical

    Fault Code: F0920

    Message:

    "sys/rack-unit-1/board"

    Explanation:

    This fault typically occurs when one or more motherboard input voltages has exceeded upper critical thresholds.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Reseat or replace the power supply.

    Step 2 If the issue persists, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major 
    
    Cause: voltage-problem
    
    mibFaultCode: 920
    
    mibFaultName: fltComputeBoardMotherBoardVoltageUpperThresholdCritical
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputeBoardPowerUsageProblem

    Fault Code: F1040

    Message:

    "sys/rack-unit-1/board"

    Explanation:

    This fault typically occurs when the motherboard power consumption exceeds certain threshold limits. When this happens, the power usage sensors on a server detect a problem.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Contact Cisco TAC.


    Fault Details:

    Severity: warning 
    
    Cause: power-problem
    
    mibFaultCode: 1040
    
    mibFaultName: fltComputeBoardPowerUsageProblem
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable

    Fault Code: F0918

    Message:

    "sys/rack-unit-1/board"

    Explanation:

    This fault typically occurs when one or more motherboard input voltages has become too high and is unlikely to recover.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: voltage-problem
    
    mibFaultCode: 918
    
    mibFaultName: fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable
    
    moClass: compute:Board
    
    Type: environmental
    

    fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable

    Fault Code: F0919

    Message:

    "sys/rack-unit-1/board"

    Explanation:

    This fault typically occurs when one or more motherboard input voltages has dropped too low and is unlikely to recover.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Contact Cisco TAC.


    Fault Details:

    Severity: critical
    
    Cause: voltage-problem
    
    mibFaultCode: 919
    
    mibFaultName: fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable
    
    moClass: compute: Board
    
    Type: environmental
    

    fltComputeBoardMotherBoardVoltageLowerThresholdCritical

    Fault Code: F0921

    Message:

    "sys/rack-unit-1/board"

    Explanation:

    This fault typically occurs when one or more motherboard input voltages has crossed lower critical thresholds.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Reseat or replace the power supply.

    Step 2 If the issue persists, create a tech-support file and contact TAC.


    Fault Details:

    Severity: major
    
    Cause: voltage-problem
    
    mibFaultCode: 921
    
    mibFaultName: fltComputeBoardMotherBoardVoltageLowerThresholdCritical
    
    moClass: compute: Board
    
    Type: environmental
    

    fltMemoryUnitECCThresholdNonCritical

    Fault Code: F2500

    Message:

    "sys/rack-unit-1/board/memarray-%d/mem-%d"

    Explanation:

    This fault indicates that the memory DIMM has crossed a non critical threshold of reported ECC errors.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.

    Step 2 Monitor the server for temperature/voltage thresholds.


    Fault Details:

    Severity: minor
    
    Cause: equipment-degraded
    
    mibFaultCode: 2500
    
    mibFaultName: fltMemoryUnitECCThresholdNonCritical
    
    moClass: memory: Unit
    
    Type: equipment
    

    fltMemoryUnitECCThresholdCritical

    Fault Code: F2501

    Message:

    "sys/rack-unit-1/board/memarray-%d/mem-%d"

    Explanation:

    This fault indicates that the memory DIMM has crossed a critical threshold of reported ECC errors.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.

    Step 2 Monitor the server for temperature/voltage thresholds.


    Fault Details:

    Severity: warning
    
    Cause: equipment-degraded
    
    mibFaultCode: 2501
    
    mibFaultName: fltMemoryUnitECCThresholdCritical
    
    moClass: memory: Unit
    
    Type: equipment
    

    fltMemoryUnitECCThresholdNonRecoverable

    Fault Code: F2502

    Message:

    "sys/rack-unit-1/board/memarray-%d/mem-%d"

    Explanation:

    This fault indicates that the memory DIMM has crossed a non recoverable threshold of reported ECC errors.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Replace the memory DIMM.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 2502
    
    mibFaultName: fltMemoryUnitECCThresholdNonRecoverable
    
    moClass: memory: Unit
    
    Type: equipment
    

    Storage-Related Faults

    fltStorageLocalDiskInoperable

    Fault Code: F0181

    Message:

    Local disk [id] on server [chassisId]/[slotId] operability: [operability]Local disk [id] on server [id] operability: [operability]

    Explanation:

    This fault occurs when the local disk has become inoperable or has been removed while the server was in use.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Insert the disk in a supported slot.

    Step 2 Remove and reinsert the local disk.

    Step 3 Replace the disk, if an additional disk is available.

    If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 181
    
    mibFaultName: fltStorageLocalDiskInoperable
    
    moClass: storage:LocalDisk
    

    fltStorageRaidBatteryInoperable

    Fault Code: F0531

    Message:

    RAID Battery on server [chassisId]/[slotId] operability: [operability]RAID Battery on server [id] operability: [operability]

    Explanation:

    This fault occurs when the RAID battery voltage is below the normal operating range.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Replace the RAID battery.

    Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 531
    
    mibFaultName: fltStorageRaidBatteryInoperable
    
    moClass: storage:RaidBattery
    
    Type: equipment
    

    fltStorageLocalDiskCopybackFailed

    Fault Code: F0978

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/pd-%d"

    Explanation:

    This fault indicates a physical disk copyback failure. This fault could indicate a physical drive problem or an issue with the RAID configuration.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Replace the physical drive and check to see if the issue is resolved after a rebuild.

    Step 2 Reseat or replace the storage controller.

    Step 3 Check configuration options for the storage controller in the MegaRAID ROM configuration page.


    Fault Details:

    Severity: warning
    
    Cause: equipment-offline
    
    mibFaultCode: 978
    
    mibFaultName: fltStorageLocalDiskCopybackFailed
    
    moClass: storage:LocalDisk
    
    Type: equipment 
    

    fltStorageRaidBatteryDegraded

    Fault Code: F0969

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

    Explanation:

    This fault indicates a controller battery backup unit failure.

    Recommended Action:

    If you see this fault, take the following action:


    Step 1 Reseat or replace the battery backup unit on the storage controller.


    Fault Details:

    Severity: warning
    
    Cause: equipment-degraded
    
    mibFaultCode: 969
    
    mibFaultName: fltStorageRaidBatteryDegraded
    
    moClass: storage:RaidBattery
    
    Type: equipment
    

    fltStorageRaidBatteryRelearnAborted

    Fault Code: F0970

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

    Explanation:

    This fault indicates that a controller battery relearn was aborted.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Restart the relearn process for the battery backup unit.

    Step 2 Reseat or replace the battery backup unit.

    Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.


    Fault Details:

    Severity: info
    
    Cause: equipment-degraded
    
    mibFaultCode: 970
    
    mibFaultName: fltStorageRaidBatteryRelearnAborted
    
    moClass: storage:RaidBattery
    
    Type: equipment 
    

    fltStorageRaidBatteryRelearnFailed

    Fault Code: F0971

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"

    Explanation:

    This fault indicates a controller battery relearn failure.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Restart the relearn process for the battery backup unit.

    Step 2 Reseat or replace the battery backup unit.

    Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.


    Fault Details:

    Severity: warning
    
    Cause: equipment-degraded
    
    mibFaultCode: 971
    
    mibFaultName: fltStorageRaidBatteryRelearnFailed
    
    moClass: storage:RaidBattery
    
    Type: equipment
    

    fltStorageVirtualDriveConsistencyCheckFailed

    Fault Code: F0982

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

    Explanation:

    This fault indicates a consistency check failure with the virtual drive.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Initiate a consistency check on the virtual drive.

    Step 2 Replace any faulty physical drives.


    Fault Details:

    Severity: warning
    
    Cause: equipment-degraded
    
    mibFaultCode: 982
    
    mibFaultName: fltStorageVirtualDriveConsistencyCheckFailed
    
    moClass: storage:VirtualDrive
    
    Type: equipment
    

    fltStorageVirtualDriveDegraded

    Fault Code: F1008

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

    Explanation:

    This fault indicates a recoverable error with the virtual drive.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Initiate a consistency check on the virtual drive.

    Step 2 Replace any faulty physical drives.


    Fault Details:

    Severity: warning
    
    Cause: equipment-degraded
    
    mibFaultCode: 1008
    
    mibFaultName: fltStorageVirtualDriveDegraded
    
    moClass: storage:VirtualDrive
    
    Type: equipment
    

    fltStorageVirtualDriveInoperable

    Fault Code: F1007

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

    Explanation:

    This fault indicates a non-recoverable error with the virtual drive.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 If the data on the drive is accessible, back up and recreate the virtual drive.

    Step 2 Replace any faulty physical drives.

    Step 3 Check for controller errors in the MegaRAID ROM page logs.


    Fault Details:

    Severity: major
    
    Cause: equipment-inoperable
    
    mibFaultCode: 1007
    
    mibFaultName: fltStorageVirtualDriveInoperable
    
    moClass: storage:storage:VirtualDrive
    
    Type: equipment
    

    fltStorageVirtualDriveReconstructionFailed

    Fault Code: F0981

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"

    Explanation:

    This fault indicates a failure in the reconstruction process of the virtual drive.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Restart the reconstruction process.


    Fault Details:

    Severity: warning
    
    Cause: equipment-degraded
    
    mibFaultCode: 981
    
    mibFaultName: fltStorageVirtualDriveReconstructionFailed
    
    moClass: storage:VirtualDrive
    
    Type: equipment
    

    fltStorageControllerInoperable

    Fault Code: F0976

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d"

    Explanation:

    This fault indicates a non-recoverable storage controller failure. This happens when the storage system cannot contact the controller for a period of time, after which it gives up, and raises this fault.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Reseat or replace the storage controller.


    Fault Details:

    Severity: warning
    
    Cause: equipment-inoperable
    
    mibFaultCode: 976
    
    mibFaultName: fltStorageControllerInoperable
    
    moClass: storage:Controller
    
    Type: equipment
    

    fltStorageControllerPatrolReadFailed

    Fault Code: F1003

    Message:

    "sys/rack-unit-1/board/storage-%s-ctlr-%d"

    Explanation:

    This fault indicates that the review of the storage system for potential physical disk errors has failed.

    Recommended Action:

    If you see this fault, take the following actions:


    Step 1 Initiate a consistency check on the virtual drive.

    Step 2 Replace any faulty physical drives.


    Fault Details:

    Severity: warning
    
    Cause: equipment-inoperable
    
    mibFaultCode: 1003
    
    mibFaultName: fltStorageControllerPatrolReadFailed
    
    moClass: storage:Controller
    
    Type: equipment
    

    System Event Log-Related Faults

    fltSysdebugMEpLogMEpLogVeryLow

    Fault Code: F0461

    Message:

    Log capacity on Management Controller on server [id] is [capacity]

    Explanation

    This fault typically occurs because Cisco Integrated Management Controller (CIMC) has detected that the system event log (SEL) on the server is almost full. The available capacity in the log is very low. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.

    Recommended Action

    If you see this fault, you can clear the SEL, if desired.

    Fault Details:

    Severity: info
    
    Cause: log-capacity
    
    mibFaultCode: 461
    
    mibFaultName: fltSysdebugMEpLogMEpLogVeryLow
    
    moClass: sysdebug:MEpLog
    
    Type: operational
    

    fltSysdebugMEpLogMEpLogFull

    Fault Code: F0462

    Message:

    Log capacity on Management Controller on server [id] is [capacity]

    Explanation

    This fault typically occurs because Cisco CIMC could not transfer the SEL file to the location specified in the SEL policy. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.

    Recommended Action

    If you see this fault, take the following actions:


    Step 1 Verify the configuration of the SEL policy to ensure that the location, user, and password provided are

    correct.

    Step 2 If you do want to transfer and clear the SEL and the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.


    Fault Details

    Severity: info
    
    Cause: log-capacity
    
    mibFaultCode: 462
    
    mibFaultName: fltSysdebugMEpLogMEpLogFull
    
    moClass: sysdebug:MEpLog
    
    Type: operational