Processor-Related Faults

This chapter contains the following sections:

fltProcessorUnitInoperable

Fault Code

F0174

Description

You see one of the following messages when this fault is raised:

  • Processor [Id] is inoperable due to high temperature: Check cooling.

  • A catastrophic fault has occurred on one of the processors: Please check the processors' status.

  • Processor [Id] is operating at a high temperature: Check cooling.

  • PVCCD_P1_VRHOT: Processor 1 is operating at a high temperature: Check cooling.

  • P1_LVC3_PWRGD: Voltage rail Power Good dropped due to PSU or HW failure, please contact CISCO TAC for assistance.

  • P1_MEM23_MEMHOT: Temperature sensor corresponding to Processor 1 Memory 2/3 has asserted a Thermal Problem: Check server cooling.

Explanation

This fault indicates that the processor has encountered a catastrophic error or has exceeded pre-set thermal/power thresholds.

Recommended Action

If you see this fault, take the following actions:

  1. If it's a thermal problem, check whether the airflow to the server is obstructed. Also, check whether the heat sink is properly seated.

  2. If it's a power or voltage problem, replace the power supply.

  3. If the problem still persists or the problem is because of the equipment, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: major

Cause: equipment-inoperable

mibFaultCode: 174

mibFaultName:fltProcessorUnitInoperable

moClass: processor:Unit

Type: equipment

fltProcessorUnitDisabled

Fault Code

F0842

Description

Processor [Id] missing: Please reseat or replace Processor [Id].

Explanation

This fault indicates that a processor has been disabled.

Recommended Action

If you see this fault, take the following actions:

  1. Re-seat the processor.

  2. If the problem still persists, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: info

Cause: equipment-disabled

mibFaultCode: 842

mibFaultName: fltProcessorUnitDisabled

moClass: processor:Unit

Type: equipment

fltProcessorUnitThermalNonCritical

Fault Code

F0175

Description

Processor [Id] Thermal threshold has crossed upper non-critical threshold: Check cooling.

Explanation

This fault occurs when the processor temperature on a server exceeds a non-critical threshold value, but is still below the critical threshold.

The possible contributing factors are as follows:

  • Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause various problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

  • Cisco UCS equipment must operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action

If you see this fault, take the following actions:

  1. Review the product specifications to determine the temperature operating range of the server.

  2. Review the Cisco UCS Site Preparation Guide to make sure that the servers have adequate airflow, including front and back clearance.

  3. Verify that the airflow to the server is not blocked.

  4. Verify that the site cooling system is operating properly.

  5. Clean the installation site at regular intervals to avoid a buildup of dust and debris, which can cause a system to overheat.

  6. If the problem still persists, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: minor

Cause: thermal-problem

mibFaultCode: 175

mibFaultName: fltProcessorUnitThermalNonCritical

moClass: processor:Unit

Type: environmental

fltProcessorUnitThermalThresholdCritical

Fault Code

F0176

Description

Processor [Id] Thermal threshold has crossed upper critical threshold: Check cooling.

Explanation

This fault occurs when the processor temperature on a rack server exceeds a critical threshold value.

The possible contributing factors are as follows:

  • Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

  • Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action

If you see this fault, take the following actions:

  1. Review the product specifications to determine the temperature operating range of the server.

  2. Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

  3. Verify that the airflow to the server is not blocked.

  4. Verify that the site cooling system is operating properly.

  5. Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

  6. If the problem still persists, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: critical

Cause: thermal-problem

mibFaultCode: 176

mibFaultName: fltProcessorUnitThermalThresholdCritical

moClass: processor:Unit

Type: environmental

fltProcessorUnitThermalThresholdNonRecoverable

Fault Code

F0177

Description

Processor [Id] Thermal threshold has crossed a preset threshold: Check cooling.

Explanation

This fault occurs when the processor temperature on a rack server has been out of the operating range.

The possible contributing factors are as follows:

  • Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.

  • Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).

Recommended Action

If you see this fault, take the following actions:

  1. Review the product specifications to determine the temperature operating range of the server.

  2. Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.

  3. Verify that the airflow to the server is not blocked.

  4. Verify that the site cooling system is operating properly.

  5. Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

  6. If the problem still persists, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: non-recoverable

Cause: thermal-problem

mibFaultCode: 177

mibFaultName: fltProcessorUnitThermalThresholdNonRecoverable

moClass: processor:Unit

Type: environmental

fltProcessorUnitVoltageThresholdCritical

Fault Code

F0179

Description

You see one of the following messages when this fault is raised:

  • Memory channel ([Id]) voltage is upper critical.

  • Processor [Id] voltage is upper critical.

  • Processor [Id] Voltage threshold has crossed upper critical threshold: Replace the Power Supply and verify if the issue is resolved. If the issue persists, call Cisco TAC.

Explanation

This fault occurs when the processor voltage has exceeded the specified hardware voltage rating.

Recommended Action

If you see this fault, take the following actions:

  1. Monitor the processor for further degradation.

  2. Review the SEL statistics on the CPU to determine which threshold was crossed.

  3. Replace the power supply.

    Before replacing this component, see the server-specific Installation and Service Guide for prerequisites, safety recommendations, and warnings.

  4. If the problem still persists, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: major

Cause: voltage-problem

mibFaultCode: 179

mibFaultName: fltProcessorUnitVoltageThresholdCritical

moClass: processor:Unit

Type: equipment

fltProcessorUnitVoltageThresholdNonCritical

Fault Code

F0178

Description

You see one of the following messages when this fault is raised:

  • Memory channel ([Id]) voltage is upper non-critical.

  • Processor [Id] voltage is upper non-critical.

  • Processor [Id] Voltage threshold has crossed upper non-critical threshold: Replace the Power Supply and verify if the issue is resolved. If the issue persists, call Cisco TAC.

Explanation

This fault occurs when the processor voltage is out of normal operating range, but has not yet reached a critical stage. Normally the processor recovers by itself.

Recommended Action

If you see this fault, take the following actions:

  1. Monitor the processor for further degradation.

  2. Review the SEL statistics on the CPU to determine which threshold was crossed.

  3. Replace the power supply.

    Before replacing this component, see the server-specific Installation and Service Guide for prerequisites, safety recommendations, and warnings.

  4. If the problem still persists, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: minor

Cause: voltage-problem

mibFaultCode: 178

mibFaultName: fltProcessorUnitVoltageThresholdNonCritical

moClass: processor:Unit

Type: equipment

fltProcessorUnitVoltageThresholdNonRecoverable

Fault Code

F0180

Description

You see one of the following messages when this fault is raised:

  • Memory channel ([Id]) voltage is upper non-recoverable.

  • Processor [Id] voltage is upper non-recoverable.

  • Processor [Id] Voltage threshold has crossed upper non-recoverable threshold: Replace the Power Supply and verify if the issue is resolved. If the issue persists, call Cisco TAC.

Explanation

This fault indicates that the processor voltage has exceeded the specified hardware voltage rating. The high voltage might cause damage to the processor.

Recommended Action

If you see this fault, create a tech-support file and contact Cisco TAC.

Fault Details

Severity: critical

Cause: voltage-problem

mibFaultCode: 180

mibFaultName: fltProcessorUnitVoltageThresholdNonRecoverable

moClass: processor:Unit

Type: equipment