Server Alarms

Server Components Alarms

Following table shows the description of the supported alarms for servers.

Name MO Severity Explanation Recommended Action

AdapterAltImage

adapter.Unit

Warning

The VIC application is running an alternate image.

For further assistance, contact Cisco TAC.

AdapterBackupImage

adapter.Unit

Critical

The VIC U boot is running the golden IE backup image.

Trigger a firmware update of the adapter. If alarm persists, contact Cisco TAC

AdapterCommunicationErrors

adapter.Unit

Critical

The VIC SUDI validation state is either passed or failed and is not determined

For further assistance, contact Cisco TAC.

AdapterCounterfeit

adapter.Unit

Critical

The VIC SUDI has been evaluated and is not valid.

For further assistance, contact Cisco TAC.

AdapterFwValidationFailed

adapter.Unit

Critical

The VIC Application status is unable to be determined.

For further assistance, contact Cisco TAC.

AdapterHostEthInterfaceDown adapter.HostEthInterface Critical The uplink interface is shut down, or a transient error caused the vNIC to fail.
  1. If an associated port is disabled, enable the port.

  2. Reacknowledge the server with the Ethernet adapter that has the failed link.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

AdapterHostEthInterfaceStandByActive adapter.HostEthInterface Warning The preferred path for the failover enabled vNIC is down and hence the secondary path is currently active.
  1. Update the configuration of the port or port channel to include the primary VLAN.

  2. If the above action does not resolve the issue, create a show tech-support file and contact Cisco TAC.

AdapterHostFcInterfaceDown adapter.HostFcInterface Critical The uplink interface is shut down, or a transient error caused the vHBA to fail.
  1. If an associated port is disabled, enable the port.

  2. Reacknowledge the server with the Fibre Channel adapter that has the failed link.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

AdapterLowUpgradesRemaining

adapter.Unit

Warning

The VIC FPGA has 50 or less upgrades remaining .

AdapterNotReachable adapter.Unit Warning Adapter is not reachable or the connectivity is not discovered from the Fabric Interconnects or FEX.
  1. Check if the corresponding Input/Output module is inserted in the chassis.

  2. Check if CIMC/BIOS are running recommended firmware version.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

AdapterNoUpgradesRemaining

adapter.Unit

Warning

The VIC FPGA has no more firmware upgrades remaining.

AdapterSecureBootFail

adapter.Unit

Critical

The VIC U Boot status is unable to be determined.

For further assistance, contact Cisco TAC.

BladeWithDeployedPciePolicyReseated compute.Blade Critical This alarm occurs when a compute blade with a deployed PCIe connectivity policy is reseated in the chassis. A server profile redeployment is required to remap the PCIe devices.
ComputeBladeMigrationDetected compute.Blade Critical This alarm occurs when a server has been detected in a slot different than the one it was discovered in.
  1. Reacknowledge the server in the current slot.

  2. If the issue persists, remove the server from the current slot and reinsert it in the correct slot.

  3. Reacknowledge the server in the correct slot.

  4. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

ComputeBoardCPLDImageVerificationFailure compute.Board Critical This alarm occurs when the CPLD image verification on the server motherboard fails.

Note

 

This alarm is applicable for M8 and M7 servers.

Please contact Cisco TAC for further assistance.

ComputeBoardPCHSecureFuseFailure

compute.Board

Critical

This alarm occurs when Intel PCH Secure Fuse verification on the server motherboard fails.

Note

 

This alarm applies to M5, M6, and M7 Intel servers.

For further assistance, contact Cisco TAC.

ComputeBoardPower compute.Board Critical The motherboard has a critical power problem. This occurs when the motherboard power consumption exceeds certain threshold limits. At that time the power usage sensors on a server detect a problem.
  1. Ensure that the motherboard is supplied with the required input voltage as per the product specifications.

  2. Create a show tech-support file and contact Cisco TAC to see if the motherboard needs replacement.

ComputeBoardTemperatureCritical compute.Board Critical The motherboard has a critical temperature threshold condition.
  1. Verify that the server fans are working properly.

  2. Wait for 24 hours to see if the problem resolves itself.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

ComputeBoardTemperatureWarning compute.Board Warning The motherboard has a warning temperature threshold condition.
  1. Review the product specifications to determine the operating temperature range.

  2. Power off unused blade servers and rack servers.

  3. Verify that the server fans are working properly.

  4. Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.

  5. Set the power profiling, power priority of the server, and the power restore state of the system through server Power Policy.

  6. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

ComputeBoardVoltageCritical compute.Board Critical The motherboard has a critical voltage threshold condition.
  1. Ensure that the motherboard is supplied with the required input voltage as per the product specifications.

  2. Create a show tech-support file and contact Cisco TAC.

ComputeBoardVoltageWarning compute.Board Warning The motherboard has a warning voltage threshold condition.
  1. Ensure that the motherboard is supplied with the required input voltage as per the product specifications.

  2. Create a show tech-support file and contact Cisco TAC.

ComputeCimcFirmwareNotSupported

compute.BladeIdentity

Warning

This fault indicates that one of the IO modules is missing.

Intersight Managed Mode does not support the existing firmware version. Upgrade the server using the firmware upgrade option in the Chassis tab.

ComputePciNodeInsertedPowerCycleRequired

compute.Blade Warning This alarm occurs if PCIe node is inserted when the compute node is in powered on state.
  1. Power down the paired compute down.

  2. After the paired compute node is completely powered off. Remove the PCIe node.

  3. Before re-inserting a PCIe node, make sure that its paired compute node is powered off.

  4. After the paired compute node has completely powered off, insert the PCIe node.

    Insert the PCIe node.

  5. Power on the PCIe node's paired compute node.

  6. After the paired compute node is completely powered on, rediscover the PCIe node.

ComputePciNodeInsertedPowerOnRequired

compute.Blade Warning This alarm occurs if PCIe node is inserted when the compute node is in powered off state.
  1. After inserting the PCIe node, power on the PCIe node's paired compute node.

  2. After the paired compute node is completely powered on, rediscover the PCIe node.

ComputePciNodeRemovedPowerCycleRequired compute.Blade Warning This alarm occurs if PCIe node is removed when the compute node is in powered on state.
  1. Power down the paired compute down.

  2. After the paired compute node is completely powered off. Remove the PCIe node.

  3. Power on the PCIe node's paired compute node.

  4. After the paired compute node is completely powered on, rediscover the PCIe node.

ComputePciNodeRemovedPowerOnRequired

compute.Blade Warning This alarm occurs if PCIe node is removed when the compute node is in powered off state.

After removing the PCIe node, power on the PCIe node's paired compute node.

ComputePciNodeUnidentified compute.Blade Warning Unidentified PCIe node detected. PCIe node will remain powered off.
  1. Verify that the inserted PCIe node is running the recommended firmware version here Servers>Server Name>Inventory>

    GPUs>PCIe-Node-GPU Name>General

  2. If the firmware is supported, create a show tech-support file and contact Cisco TAC.

ComputePciNodeUnsupported compute.Blade Warning Unsupported PCIe node detected. PCIe node will remain powered off.
  1. Verify that the PCIe node is running the recommended firmware version by checking here Servers>Server Name>Inventory>GPUs

    >PCIe-Node-GPU Name>General

  2. Verify that the paired compute node is running the recommended firmware version by checking here Servers>Server Name>General

  3. If the firmware versions are compatible, create a show tech-support file and contact Cisco TAC.

ComputePhysicalBiosPostTimeOut

compute.Physical

Critical

This alarm typically occurs when the server has encountered a BIOS POST timeout.

For further assistance, contact Cisco TAC.

ComputePhysicalMissing compute.Physical Critical This alarm occurs when a server has been removed from the slot it was discovered in.
  1. Make sure a server is inserted in the slot.

  2. Check the Power-On-Self-Test (POST) results for the server.

  3. Check the power state of the server.

  4. If the server is off, turn the server on.

  5. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

ComputePhysicalWillBoot compute.Physical Critical The UCS Will Boot is a cursory check to ensure that the blade is configured properly to allow the BIOS to proceed. This alarm indicates that a critical Will boot error is encountered on the server. This error occurs when the CPU and DIMM configuration check fails.
  1. Verify that the DIMMs are installed in a supported configuration.

  2. Verify that an adapter and CPU are installed.

  3. Download the System Event Logs file from the GUI by clicking Servers>Server Name>... >System>Download System Event Log

  4. Review the SEL statistics on the DIMM to determine which threshold was crossed.

  5. Create a show tech-support file and contact Cisco TAC to see if the DIMM needs replacement.

ComputeRackUnauthorizedManagementVic

compute.RackUnit

Critical

The VIC SUDI verification has failed on the VIC used for management traffic.

Note

 
This alarm is applicable for Intersight Managed Mode only.

For further assistance, contact Cisco TAC.

ComputeRackUnitHealthCritical compute.RackUnit Critical The server's health state has reached the critical threshold.
  1. Read fault summary and determine course of action.

  2. If the above action does not resolve the issue, create a show tech-support file and contact Cisco TAC.

ComputeRackUnitHealthWarning compute.RackUnit Warning The server's health state has reached the warning threshold.
  1. Read fault summary and determine course of action.

  2. If the above action does not resolve the issue, create a show tech-support file and contact Cisco TAC.

ComputeServerAdapterUnitDeprecated compute.Physical Critical One or more adapters connected to the server are deprecated, or are not supported in the current Intersight release.
  1. Verify that only the supported adapters are installed on the server.

  2. If the above action does not resolve the issue, create a show tech-support file and contact Cisco TAC.

ComputeServerDisconnected

compute.Physical

Warning

Server is not reachable.

If you see this alarm, take the following actions. Check the server's network connectivity.

ComputeServerNotConnected

compute.BladeIdentity

Warning

Server discovery failed because the device is not connected.

Server discovery failed because the device is not connected. For further assistance, contact Cisco TAC.

ComputeServerRecoveryKeyNotAvailable

cond.Alarms

Info

Cisco Intersight may fail to collect the ESXi OS recovery key from the SAN boot LUN.

Please verify the following information:

  • Supported Server Firmware:

    • Cisco UCS X-Series (M7, M8): 6.0.2.260041 or later

    • Cisco UCS C-Series (M7, M8): 6.0.2.260046 or later

  • Supported UCS Tools Versions:

    • ESXi 8.0 (CIS-ucs-tool-esxi-2.0.06 or later)

    • ESXi 9.0 (CIS-ucs-tool-esxi-2.1.05 or later)

    For more information, refer to Software Requirements for UCS Tools
  • The server must run ESXi 8.0 or later.

    .

Once the above checks are completed, proceed to redeploy and activate the server profile.

EquipmentFruMissing equipment.Fru Critical This alarm typically occurs when any hardware component is missing in a server, chassis, FEX or FI and the server or chassis is not rediscovered manually.

If you see this fault, take the following actions:

  1. Make sure the hardware component is inserted in the correct slot in the server.

  2. Check whether the hardware component is connected and configured properly and is running the recommended firmware version.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

EquipmentFruReplaced equipment.Fru Critical This alarm typically occurs when any adapter is replaced in a server and the server is not decommissioned and recommissioned.

If you see this fault, take the following actions:

  1. For rack servers, decommission and recommission the server if any hardware component is changed.

  2. For non-rack servers, acknowledge the server if any hardware component is changed.

  3. If no hardware component was changed, Create a show tech-support file and contact Cisco TAC.

EquipmentRackFanSpeedCritical equipment.Fan Critical The server fan has a speed threshold condition. This fault typically occurs when a fan is running at a speed that is too slow or too fast. A malfunctioning fan can affect the operating temperature of the rack server.

If you see this fault, take the following actions:

  1. If the fan is running below the expected speed, ensure that the fan blades are not blocked.

  2. If the fan is running above the expected speed, remove and re-insert the fan.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to see if the fan needs replacement.

EquipmentRackPsuInputLost equipment.Psu Warning The power supply has no AC input.
  1. Monitor the PSU status.

  2. Verify that the power cord is properly connected to the power supply and to the power source.

  3. If possible, remove and reseat the PSU.

  4. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

EquipmentRackPsuOutputCurrentCritical equipment.Psu Critical The power supply has a output current threshold condition.

Create a show tech-support file and contact Cisco TAC to see if the PSU needs replacement.

EquipmentRackPsuOutputCurrentWarning equipment.Psu Warning The power supply has a output current threshold condition.

Create a show tech-support file and contact Cisco TAC to see if the PSU needs replacement.

EquipmentRackPsuOutputPowerCritical equipment.Psu Critical The server power supply has an output power threshold condition. This fault occurs if the current output of the PSU in the rack server is far above or below the non-recoverable threshold value.

Create a show tech-support file and contact Cisco TAC to see if the PSU needs replacement.

EquipmentRackPsuOutputPowerWarning equipment.Psu Warning The server power supply has an output power threshold condition. This fault occurs if the current output of the PSU in the rack server is far above or below the non-recoverable threshold value.

Create a show tech-support file and contact Cisco TAC to see if the PSU needs replacement.

EquipmentRackPsuOutputVoltageCritical equipment.Psu Critical The power supply has an output voltage threshold condition.

Create a show tech-support file and contact Cisco TAC to see if the PSU needs replacement.

EquipmentRackPsuOutputVoltageWarning equipment.Psu Warning The power supply has an output voltage threshold condition.

Create a show tech-support file and contact Cisco TAC to see if the PSU needs replacement.

EquipmentRackPsuTemperatureCritical equipment.Psu Critical The power supply has a temperature threshold condition.
  1. Monitor the PSU status.

  2. Verify that the server fans are working properly.

  3. Create a show tech-support file and contact Cisco TAC to see if the fan needs replacement.

EquipmentRackPsuTemperatureWarning equipment.Psu Warning The power supply has a temperature threshold condition.
  1. Monitor the PSU status.

  2. Verify that the server fans are working properly.

  3. Create a show tech-support file and contact Cisco TAC to see if the faulty fan needs replacement.

ExpanderModuleRemoved compute.Blade Critical This alarm occurs when a X-Fabric module that connects the server's mapped PCIe devices is removed from the chassis. Reinsert the X-Fabric module(s) into the chassis and then redeploy the server profile to remap the devices.
GraphicsCardTemperatureCritical graphics.Card Critical The GPU has a critical temperature threshold condition.
  1. Verify that the server fans are working properly.

  2. Wait for 24 hours to see if the problem resolves itself.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

GraphicsCardTemperatureWarning graphics.Card Warning The GPU has a warning temperature threshold condition.
  1. Verify that the server fans are working properly.

  2. Wait for 24 hours to see if the problem resolves itself.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

MemoryUnitBankError memory.Unit Warning The memory unit has encountered a Bank Virtual lock step (VLS) error.
  1. Restart the host so that the DIMM gets auto-repaired.

  2. If the above action does not resolve the issue, create a show tech-support file and contact Cisco TAC.

MemoryUnitBistError memory.Unit Critical The memory unit has encountered a BIST error.

Create a show tech-support file and contact Cisco TAC to see if the faulty component of the DIMM needs a replacement.

MemoryUnitInvalidPopulation memory.Unit Critical The DIMM slot has been invalidly populated.
  1. Reseat the DIMM into the correct slot.

  2. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

MemoryUnitInvalidTypeError memory.Unit Critical The memory unit type is invalid.

Create a show tech-support file and contact Cisco TAC to see if the failed DIMM needs a replacement.

MemoryUnitMismatchError memory.Unit Critical A memory mismatch has been detected on this memory unit.

Create a show tech-support file and contact Cisco TAC to see if the mismatched DIMM needs a replacement.

MemoryUnitRankError memory.Unit Warning The memory unit has encountered a Rank Virtual lock step (VLS) error.
  1. Restart the host so that the DIMM gets auto-repaired.

  2. If the above action does not resolve the issue, create a show tech-support file and contact Cisco TAC.

MemoryUnitRasModeError memory.Unit Critical The memory unit has encountered a RAS Mode error.
  1. Reboot the server.

  2. If the issue persists, verify that the DIMMs are installed in a supported configuration.

  3. Reseat the DIMM.

  4. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to see if the DIMM needs a replacement.

MemoryUnitSpdError memory.Unit Critical The memory unit has encountered a SPD error.

Create a show tech-support file and contact Cisco TAC to see if the faulty component of the DIMM needs a replacement.

MemoryUnitTemperatureCritical memory.Unit Critical The memory unit has a critical temperature threshold condition.
  1. Verify that the server fans are working properly.

  2. Wait for 24 hours to see if the problem resolves itself.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

MemoryUnitTemperatureWarning memory.UnitPSU Warning The memory unit has a warning temperature threshold condition.
  1. Verify that the server fans are working properly.

  2. Wait for 24 hours to see if the problem resolves itself.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

MemoryUnitUncorrectableError memory.Unit Critical The memory unit has encountered an uncorrectable ECC error.
  1. Monitor the error statistics of the degraded DIMM.

  2. Create a show tech-support file and contact Cisco TAC to see if the inoperable DIMM needs a replacement.

NodeDcBrick1Fault pci.Node Critical A fault has been detected on DC Brick 1. Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC.
NodeDcBrick2Fault pci.Node Critical A fault has been detected on DC Brick 2. Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC.
NodeDetectedInInvalidSlot pci.Node Critical PCIe node is populated in an invalid slot. Move the PCIe node to a supported slot.
NodeHotSwapController1Fault pci.Node Critical A fault has been detected on Hot Swap Controller 1. Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC.
NodeHotSwapController2Fault pci.Node Critical A fault has been detected on Hot Swap Controller 2. Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC.
NodeIncompatibleXFM1Detected pci.Node Critical XFM Module 1 is incompatible with the PCIe node. Replace the XFM with a compatible XFM module.
NodeIncompatibleXFM2Detected pci.Node Critical XFM Module 2 is incompatible with the PCIe node. Replace the XFM with a compatible XFM module.
NodeMoved pci.Node Critical This alarm occurs when a PCIe node has been moved from its discovered location to another location. Perform a rediscover operation to remove node inventory from the old location and inventory it in the new location
NodeMovedAndReplaced pci.Node Critical This alarm occurs when a PCIe node has been moved from its discovered location to another location, and another server or node is inserted into its original location. Perform a rediscover operation to remove node inventory from the old location and inventory it in the new location
NodeRemoved pci.Node Critical This alarm occurs when a discovered PCIe node is physically removed from its location. Reinsert the node back into its slot or perform a remove operation to remove the node from the inventory.
NodeReplaced pci.Node Critical This alarm occurs when a discovered PCIe node is physically removed from its location and another device is inserted in that location. Perform a remove operation on the node to remove the old node from the inventory and inventory the new node.
NodeUnknownCardPresentInPCIeSlot1 pci.Node Warning An unknown PCIe card is present in PCIe slot 1. Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected.
NodeUnknownCardPresentInPCIeSlot2 pci.Node Warning An unknown PCIe card is present in PCIe slot 2. Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected.
NodeUnknownCardPresentInPCIeSlot3 pci.Node Warning An unknown PCIe card is present in PCIe slot 3. Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected.
NodeUnknownCardPresentInPCIeSlot4 pci.Node Warning An unknown PCIe card is present in PCIe slot 4. Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected.
NodeUnsupportedCardPresentInPCIeSlot1 pci.Node Warning A unsupported PCIe card is present in PCIe slot 1. Install a supported PCIe card in PCIe slot 1.
NodeUnsupportedCardPresentInPCIeSlot2 pci.Node Warning A unsupported PCIe card is present in PCIe slot 2. Install a supported PCIe card in PCIe slot 2.
NodeUnsupportedCardPresentInPCIeSlot3 pci.Node Warning A unsupported PCIe card is present in PCIe slot 3. Install a supported PCIe card in PCIe slot 3.
NodeUnsupportedCardPresentInPCIeSlot4 pci.Node Warning A unsupported PCIe card is present in PCIe slot 4. Install a supported PCIe card in PCIe slot 4.

PcieAuxPowerCableMissing

equipment.SharedGraphicsCard

Critical

Auxiliary PCIe power cable not detected.

  • Check auxiliary power cable is fully connected.

  • For further assistance, contact Cisco TAC.

PcieAuxPowerCableMissing equipment.SharedGraphicsCard Critical The auxiliary power cable for the PCIe card is not detected. Ensure that the auxiliary power cable is properly connected.
PcieAuxPowerCableMissing graphics.Card Critical The auxiliary power cable for the PCIe card is not detected. Ensure that the auxiliary power cable is properly connected.
PcieMappedDeviceNotAvailable compute.Blade Critical This alarm occurs when the Mapped PCIe devices are not available. Check the mapped PCIe devices and review related alarms on the chassis and PCIe node for further details.
PcieSlotPowerFault equipment.SharedGraphicsCard Critical A power fault has been detected on the PCIe slot. Check PCIe card is properly seated and power and MCIO cables are installed properly.
PcieSlotPowerFault graphics.Card Critical A power fault has been detected on the PCIe slot. Check PCIe card is properly seated and power and MCIO cables are installed properly.

PcieSlotPowerFault

equipment.SharedGraphicsCard

Critical

A power fault has been detected on the PCIe slot.

  • Check PCIe card seating, power cables, and MCIO cables.

  • For further assistance, contact Cisco TAC.

PciNodePCIeLinkConfigIssue pci.Node Warning PCIe link or port configuration issue detected. PCIe links may not be up or configured properly between PCIe slots and CPUs.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Ensure that all the required hardware are installed as per the guide.

  3. If the issue still persists, create a show tech-support file and contact Cisco TAC.

PciNodePowerFault pci.Node Critical PCIe node power fault detected.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Verify that the PCIe node has two dark colored GPU cables that carry power and data.

  3. Verify that the Power cables are connected to the power source and inserted into the PCIe node.

PciNodePresentXFM1Absent pci.Node Warning PCIe node detected with missing XFM1. PCIe node cannot be fully managed without both XFMs being present.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Ensure that all the required hardware are installed as per the guide.

  3. If the issue still persists, create a show tech-support file and contact Cisco TAC.

PciNodeRemoved compute.Blade Critical This alarm occurs when a PCIe node which hosts this server's mapped PCIe devices is removed from the chassis. Reinsert the node back into the chassis and redeploy the server profile to remap the devices. Otherwise, undeploy the service profile to unmap the PCIe devices.
PciNodeRiser1Missing pci.Node Warning The PCIe node Riser 1 is missing. No PCIe lanes to CPU1 can be utilized.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Ensure that all the required hardware are installed as per the guide.

  3. If the issue still persists, create a show tech-support file and contact Cisco TAC.

PciNodeRiser1PowerFault pci.Node Critical PCIe node Riser 1 power fault detected.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Verify that the Power cable for Riser 1 is inserted correctly in the PCIe node.

  3. Verify that the Power cable for Riser 1 is connected to the power source.

PciNodeRiser2PowerFault pci.Node Critical PCIe node Riser 2 power fault detected.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Verify that the Power cable for Riser 2 is inserted correctly in the PCIe node.

  3. Verify that the Power cable for Riser 2 is connected to the power source.

PciNodeRiser2PresentCPU2Absent pci.Node Warning PCIe node Riser 2 is present, but CPU2 is absent. PCIe slots on Riser 2 are not connected.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Ensure that all the required hardware are installed as per the guide.

  3. If the issue still persists, create a show tech-support file and contact Cisco TAC.

PciNodeRiserMismatch pci.Node Warning The PCIe node Riser type mismatch. Risers will remain powered off.
  1. Review the Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Mixing of GPU models are not supported in the compute node. Ensure that each PCIe node is configured with the same type of GPU.

PciNodeUnknownPCIeCardPresentOnRiser1 pci.Node Warning PCIe node has an unknown PCIe card present on Riser 1. Riser will remain powered off.
  1. Review the Cisco UCS X210c M6 Compute Node Installation and Service Note and Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Install the recommended type of GPU on Riser 1.

  3. Power on the riser.

PciNodeUnknownPCIeCardPresentOnRiser2 pci.Node Warning PCIe node has an unknown PCIe card present on Riser 2. Riser will remain powered off.
  1. Review the Cisco UCS X210c M6 Compute Node Installation and Service Note and Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Install the recommended type of GPU on Riser 1.

  3. Power on the riser.

PciNodeUnsupportedPCIeCardPresentOnRiser1 pci.Node Warning PCIe node has an unsupported PCIe card present on Riser 1. Riser will remain powered off.
  1. Review the Cisco UCS X210c M6 Compute Node Installation and Service Note and Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Install the recommended type of GPU on Riser 1.

  3. Power on the riser.

PciNodeUnsupportedPCIeCardPresentOnRiser2 pci.Node Warning PCIe node has an unsupported PCIe card present on Riser 2. Riser will remain powered off.
  1. Review the Cisco UCS X210c M6 Compute Node Installation and Service Note and Cisco UCS X440p PCIe Node Installation and Service Guide.

  2. Install the recommended type of GPU on Riser 2.

  3. Power on the riser.

PeerNVLinkedGpuNotMapped graphics.Card Critical This alarm occurs when NVLinked GPUs are not mapped to the same server. Ensure NVLinked GPUs are mapped to the same server.
ProcessorUnitCatErr processor.Unit Critical The processor has encountered a CATERR error. The system event log (SEL) contains events related to the processor's catastrophic error (CATERR) sensor.

Create a show tech-support file and contact Cisco TAC.

ProcessorUnitTemperatureCritical processor.Unit Critical The processor has a critical temperature threshold condition.
  1. Verify that the server fans are working properly.

  2. Wait for 24 hours to see if the problem resolves itself.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

ProcessorUnitTemperatureWarning processor.Unit Warning The processor has a warning temperature threshold condition.
  1. Verify that the server fans are working properly.

  2. Wait for 24 hours to see if the problem resolves itself.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

ProcessorUnitThermtrip processor.Unit Critical The processor has encountered a THERMTRIP error.
  1. Review the product specifications to determine the temperature operating range of the server.

  2. Verify that the server fans are working properly.

  3. Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.

  4. Power off unused blade servers and rack servers.

  5. Set the power profiling, power priority of the server, and the power restore state of the system through server Power Policy.

  6. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

RackFanSpeedWarning

equipment.Fan

Warning

The server fan has a warning speed threshold condition.

  • Verify server fans are operating normally.

  • Verify input voltage is within supported range.

  • For further assistance, contact Cisco TAC.

RackPsuDetectionFailure

equipment.Psu

Critical

The health state monitor detects a PSU failure.

  • Verify PSU seating and power cable connection.

  • Verify input voltage is within supported range.

  • Reseat or replace PSU.

  • For further assistance, contact Cisco TAC.

RackPsuOutputCurrentWarning

equipment.Psu

Warning

PSU temperature above warning threshold.

  • Monitor PSU status

  • For further assistance, contact Cisco TAC.

RackPsuOutputVoltageWarning

equipment.Psu

Warning

PSU temperature above warning threshold.

  • Monitor PSU status

  • Verify cooling

  • For further assistance, contact Cisco TAC.

RackPsuPredictiveFailure

equipment.Psu

Critical

The PSU is predicted to fail.

  • Verify power input and PSU seating.

  • Replace PSU if prediction persists.

  • For further assistance, contact Cisco TAC.

RackPsuTemperatureWarning

equipment.Psu

Warning

PSU temperature above warning threshold.

  • Verify rack server cooling and airflow.

  • For further assistance, contact Cisco TAC.

ServerProfileStateOutOfSyncWarning

server.profile

Warning

The server profile moved to Out-of-sync state.

  1. Evaluate the differences between the server profile configuration and the end-point configuration.

  2. Redeploy server profile to apply the configuration in server profile.

ServerProfileStatePendingChangesWarning

server.profile

Warning

The server profile has moved to pending-changes state.

Check the server policy configuration for Pending-changes and deploy the server profile again to apply the changes.

StorageControllerAuthFailure storage.Controller Critical This alarm occurs when SPDM authentication fails for the storage controller.

If you see this fault, take the following actions:

  1. Check whether the storage controller is in the list of supported controllers, if not, create a show tech-support file and contact Cisco TAC to replace with a supported controller.

  2. If the Storage Controller firmware has been updated, reboot the controller.

StorageControllerFailed storage.Controller Critical This alarm occurs when the storage controller is in failed state. If the Storage controller is in failed state, create a show tech-support file and contact Cisco TAC to see if the controller needs replacement.
StorageControllerFlashDegraded storage.Controller Critical This alarm occurs when the storage controller is functional, but the on-board flash has degraded.

If you see this fault, take the following action:

  1. Reset the CIMC and update Board Controller firmware.

  2. For PCI and mezz-based controllers, check the seating of the storage controller. If the problem persists, create a show tech-support file and contact Cisco TAC to see if the controller needs replacement.

StorageControllerFlashFailed storage.Controller Critical This alarm occurs when the storage controller is functional but the on-board flash has failed.

If the flash is in failed state, create a show tech-support file and contact Cisco TAC to see if the controller needs replacement.

StorageControllerForeignConfig storage.Controller Critical This alarm occurs when foreign configurations are present in the physical drives attached to the storage controller.

If you see this fault, take the following actions:

  1. On the GUI, click Clear Foreign Configuration under ellipsis menu by navigating as follows: Servers>Server Name> Inventory>Storage Controller>Controller Name

  2. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

StorageControllerInvalidConfiguration storage.Controller Critical This alarm occurs when the storage controller contains invalid configuration.
  1. Check whether the storage controller is in the list of supported controllers.

  2. If not, create a show tech-support file and contact Cisco TAC to replace with a supported controller.

  3. If the above actions do not resolve the issue,

StorageControllerInvalidFirmware storage.Controller Critical This alarm occurs when the storage controller contains invalid firmware.
  1. Update the firmware of the Storage Controller.

  2. Reboot the controller.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

StorageControllerLostConfiguration storage.Controller Critical This alarm occurs when the storage controller has lost its configuration data.

When you replace a RAID controller, the RAID configuration that is stored in the controller is lost.

Use this procedure to restore your RAID configuration to the new RAID Controller.

  • For Legacy mode

    1. Power off the server, replace your RAID controller.

    2. Reboot the server .

    3. Press F to import foreign configuration(s) when you see the on-screen prompt.

  • For UEFI Boot mode,

    1. Check if the server is configured in Unified Extensible Firmware Interface (UEFI) mode.

    2. Power off the server, replace the RAID controller.

    3. Reboot the server.

    4. Press F2 when prompted to enter the BIOS Setup utility.

    5. Under Setup Utility, navigate to Advanced > Select controller > Configure, and click Import foreign configuration to Import.

If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC.

StorageControllerUnresponsive storage.Controller Critical This alarm occurs when contact with the storage controller is probably lost, and the storage controller has become unresponsive. For PCI and mezz-based storage controllers, check the seating of the storage controller. If the problem persists, create a show tech-support file and contact Cisco TAC to see if the controller needs replacement.
StoragePhysicalDiskFailed storage.PhysicalDisk Critical This alarm occurs when the storage physical disk is in failed state. If the drive state is in failed state, create a show tech-support file and contact Cisco TAC to see if the disk needs to be replaced.
StoragePhysicalDiskForeignConfig storage.PhysicalDisk Critical This alarm occurs when the storage physical disk contains a foreign configuration.

If you see this fault, take the following actions:

  1. Review Storage Policy configuration in the service profile and verify that the selected server meets the requirements in the policy.

  2. If applicable, reseat the disks.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to see if the disks need replacement.

StoragePhysicalDiskOffline storage.PhysicalDisk Critical This alarm occurs when storage physical disk is in Offline state.

If you see this fault, take the following actions:

  1. Verify the presence and health of physical disks.

  2. If applicable, reseat the disks.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to replace the used disks.

StoragePhysicalDiskPredictiveFailure storage.PhysicalDisk Critical This alarm occurs when storage physical disk is in predictive failure state. If the drive state is in predictive-failure state, create a show tech-support file and contact Cisco TAC to see if the disk needs to be replaced.

StoragePhysicalDiskReadyForRemoval

storage.PhysicalDisk

Informational (Info)

The physical disk is in quiesced state and ready for removal.

For further assistance, contact Cisco TAC.

StoragePhysicalDiskRebuilding

storage.PhysicalDisk

Informational (Info)

The physical disk is in rebuilding state.

For further assistance, contact Cisco TAC.

StoragePhysicalDiskSelfTestFail storage.PhysicalDisk Critical This alarm occurs when the self-test on a storage physical disk has failed.

Create a show tech-support file and contact Cisco TAC.

StoragePhysicalDiskUnConfiguredBad storage.PhysicalDisk Warning This alarm occurs when the storage physical disk is in Unconfigured Bad state and is not available for RAID volume.

If you see this fault, take the following actions:

  1. Verify the connectivity between physical disks RAID Controller.

  2. Verify the presence and health of physical disks.

  3. Reseat the disks.

  4. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to see if the used disks need replacement.

StorageRaidBatteryDegraded storage.BatteryBackupUnit Critical This alarm occurs when the storage battery backup unit is in degraded state.

If you see this fault, take the following actions:

  1. If the fault reason indicates the backup unit is in a relearning cycle, wait for relearning to complete.

  2. If the fault reason indicates the backup unit is about to fail, create a show tech-support file and contact Cisco TAC to see if backup unit needs replacement.

StorageVirtualDriveCacheDegraded

storage.VirtualDrive

Warning

Virtual drive cache is in degraded state.

For further assistance, contact Cisco TAC.

StorageVirtualDriveDegraded storage.VirtualDrive Critical This alarm occurs when the storage virtual drive is in degraded state.

If you see this fault, take the following actions:

  1. If the drive is performing a consistency check operation, wait for the operation to complete.

  2. Verify the presence and health of disks that are used by the virtual drive.

  3. If applicable, reseat the disks.

  4. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to see if the used disks need to be replaced.

StorageVirtualDriveOffline storage.VirtualDrive Critical This alarm occurs when the storage virtual drive is in offline state.

If you see this fault, take the following actions:

  1. Verify the presence and health of disks that are used by the virtual drive.

  2. If applicable, reseat the disks.

  3. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to see if the disks need replacement.

StorageVirtualDrivePartiallyDegraded storage.VirtualDrive Critical The storage virtual drive is partially degraded. The operating condition of the virtual drive is not optimal.

If you see this fault, take the following actions:

  1. If the drive is performing a consistency check operation, wait for the operation to complete.

  2. Verify the presence and health of disks that are used by the virtual drive.

  3. If applicable, reseat the disks.

  4. If the above actions do not resolve the issue, create a show tech-support file and contact Cisco TAC to see if the disks need replacement.