Server Components Alarms
Following table shows the description of the supported alarms for servers.
| Name | MO | Severity | Explanation | Recommended Action | ||
|---|---|---|---|---|---|---|
|
AdapterAltImage |
adapter.Unit |
Warning |
The VIC application is running an alternate image. |
For further assistance, contact Cisco TAC. |
||
|
AdapterBackupImage |
adapter.Unit |
Critical |
The VIC U boot is running the golden IE backup image. |
Trigger a firmware update of the adapter. If alarm persists, contact Cisco TAC | ||
|
AdapterCommunicationErrors |
adapter.Unit |
Critical |
The VIC SUDI validation state is either passed or failed and is not determined |
For further assistance, contact Cisco TAC. |
||
|
AdapterCounterfeit |
adapter.Unit |
Critical |
The VIC SUDI has been evaluated and is not valid. |
For further assistance, contact Cisco TAC. |
||
|
AdapterFwValidationFailed |
adapter.Unit |
Critical |
The VIC Application status is unable to be determined. |
For further assistance, contact Cisco TAC. |
||
| AdapterHostEthInterfaceDown | adapter.HostEthInterface | Critical | The uplink interface is shut down, or a transient error caused the vNIC to fail. |
|
||
| AdapterHostEthInterfaceStandByActive | adapter.HostEthInterface | Warning | The preferred path for the failover enabled vNIC is down and hence the secondary path is currently active. |
|
||
| AdapterHostFcInterfaceDown | adapter.HostFcInterface | Critical | The uplink interface is shut down, or a transient error caused the vHBA to fail. |
|
||
|
AdapterLowUpgradesRemaining |
adapter.Unit |
Warning |
The VIC FPGA has 50 or less upgrades remaining . |
|||
| AdapterNotReachable | adapter.Unit | Warning | Adapter is not reachable or the connectivity is not discovered from the Fabric Interconnects or FEX. |
|
||
|
AdapterNoUpgradesRemaining |
adapter.Unit |
Warning |
The VIC FPGA has no more firmware upgrades remaining. |
|||
|
AdapterSecureBootFail |
adapter.Unit |
Critical |
The VIC U Boot status is unable to be determined. |
For further assistance, contact Cisco TAC. |
||
| BladeWithDeployedPciePolicyReseated | compute.Blade | Critical | This alarm occurs when a compute blade with a deployed PCIe connectivity policy is reseated in the chassis. | A server profile redeployment is required to remap the PCIe devices. | ||
| ComputeBladeMigrationDetected | compute.Blade | Critical | This alarm occurs when a server has been detected in a slot different than the one it was discovered in. |
|
||
| ComputeBoardCPLDImageVerificationFailure | compute.Board | Critical | This alarm occurs when the CPLD image verification on the server motherboard fails.
|
Please contact Cisco TAC for further assistance. | ||
|
ComputeBoardPCHSecureFuseFailure |
compute.Board |
Critical |
This alarm occurs when Intel PCH Secure Fuse verification on the server motherboard fails.
|
For further assistance, contact Cisco TAC. |
||
| ComputeBoardPower | compute.Board | Critical | The motherboard has a critical power problem. This occurs when the motherboard power consumption exceeds certain threshold limits. At that time the power usage sensors on a server detect a problem. |
|
||
| ComputeBoardTemperatureCritical | compute.Board | Critical | The motherboard has a critical temperature threshold condition. |
|
||
| ComputeBoardTemperatureWarning | compute.Board | Warning | The motherboard has a warning temperature threshold condition. |
|
||
| ComputeBoardVoltageCritical | compute.Board | Critical | The motherboard has a critical voltage threshold condition. |
|
||
| ComputeBoardVoltageWarning | compute.Board | Warning | The motherboard has a warning voltage threshold condition. |
|
||
|
ComputeCimcFirmwareNotSupported |
compute.BladeIdentity |
Warning |
This fault indicates that one of the IO modules is missing. |
Intersight Managed Mode does not support the existing firmware version. Upgrade the server using the firmware upgrade option in the Chassis tab. |
||
|
ComputePciNodeInsertedPowerCycleRequired |
compute.Blade | Warning | This alarm occurs if PCIe node is inserted when the compute node is in powered on state. |
|
||
|
ComputePciNodeInsertedPowerOnRequired |
compute.Blade | Warning | This alarm occurs if PCIe node is inserted when the compute node is in powered off state. |
|
||
| ComputePciNodeRemovedPowerCycleRequired | compute.Blade | Warning | This alarm occurs if PCIe node is removed when the compute node is in powered on state. |
|
||
|
ComputePciNodeRemovedPowerOnRequired |
compute.Blade | Warning | This alarm occurs if PCIe node is removed when the compute node is in powered off state. |
After removing the PCIe node, power on the PCIe node's paired compute node. |
||
| ComputePciNodeUnidentified | compute.Blade | Warning | Unidentified PCIe node detected. PCIe node will remain powered off. |
|
||
| ComputePciNodeUnsupported | compute.Blade | Warning | Unsupported PCIe node detected. PCIe node will remain powered off. |
|
||
|
ComputePhysicalBiosPostTimeOut |
compute.Physical |
Critical |
This alarm typically occurs when the server has encountered a BIOS POST timeout. |
For further assistance, contact Cisco TAC. |
||
| ComputePhysicalMissing | compute.Physical | Critical | This alarm occurs when a server has been removed from the slot it was discovered in. |
|
||
| ComputePhysicalWillBoot | compute.Physical | Critical | The UCS Will Boot is a cursory check to ensure that the blade is configured properly to allow the BIOS to proceed. This alarm indicates that a critical Will boot error is encountered on the server. This error occurs when the CPU and DIMM configuration check fails. |
|
||
|
ComputeRackUnauthorizedManagementVic |
compute.RackUnit |
Critical |
The VIC SUDI verification has failed on the VIC used for management traffic.
|
For further assistance, contact Cisco TAC. |
||
| ComputeRackUnitHealthCritical | compute.RackUnit | Critical | The server's health state has reached the critical threshold. |
|
||
| ComputeRackUnitHealthWarning | compute.RackUnit | Warning | The server's health state has reached the warning threshold. |
|
||
| ComputeServerAdapterUnitDeprecated | compute.Physical | Critical | One or more adapters connected to the server are deprecated, or are not supported in the current Intersight release. |
|
||
|
ComputeServerDisconnected |
compute.Physical |
Warning |
Server is not reachable. |
If you see this alarm, take the following actions. Check the server's network connectivity. |
||
|
ComputeServerNotConnected |
compute.BladeIdentity |
Warning |
Server discovery failed because the device is not connected. |
Server discovery failed because the device is not connected. For further assistance, contact Cisco TAC. |
||
|
ComputeServerRecoveryKeyNotAvailable |
cond.Alarms |
Info |
Cisco Intersight may fail to collect the ESXi OS recovery key from the SAN boot LUN. |
Please verify the following information:
Once the above checks are completed, proceed to redeploy and activate the server profile. |
||
| EquipmentFruMissing | equipment.Fru | Critical | This alarm typically occurs when any hardware component is missing in a server, chassis, FEX or FI and the server or chassis is not rediscovered manually. |
If you see this fault, take the following actions:
|
||
| EquipmentFruReplaced | equipment.Fru | Critical | This alarm typically occurs when any adapter is replaced in a server and the server is not decommissioned and recommissioned. |
If you see this fault, take the following actions:
|
||
| EquipmentRackFanSpeedCritical | equipment.Fan | Critical | The server fan has a speed threshold condition. This fault typically occurs when a fan is running at a speed that is too slow or too fast. A malfunctioning fan can affect the operating temperature of the rack server. |
If you see this fault, take the following actions:
|
||
| EquipmentRackPsuInputLost | equipment.Psu | Warning | The power supply has no AC input. |
|
||
| EquipmentRackPsuOutputCurrentCritical | equipment.Psu | Critical | The power supply has a output current threshold condition. |
Create a |
||
| EquipmentRackPsuOutputCurrentWarning | equipment.Psu | Warning | The power supply has a output current threshold condition. |
Create a |
||
| EquipmentRackPsuOutputPowerCritical | equipment.Psu | Critical | The server power supply has an output power threshold condition. This fault occurs if the current output of the PSU in the rack server is far above or below the non-recoverable threshold value. |
Create a |
||
| EquipmentRackPsuOutputPowerWarning | equipment.Psu | Warning | The server power supply has an output power threshold condition. This fault occurs if the current output of the PSU in the rack server is far above or below the non-recoverable threshold value. |
Create a |
||
| EquipmentRackPsuOutputVoltageCritical | equipment.Psu | Critical | The power supply has an output voltage threshold condition. |
Create a |
||
| EquipmentRackPsuOutputVoltageWarning | equipment.Psu | Warning | The power supply has an output voltage threshold condition. |
Create a |
||
| EquipmentRackPsuTemperatureCritical | equipment.Psu | Critical | The power supply has a temperature threshold condition. |
|
||
| EquipmentRackPsuTemperatureWarning | equipment.Psu | Warning | The power supply has a temperature threshold condition. |
|
||
| ExpanderModuleRemoved | compute.Blade | Critical | This alarm occurs when a X-Fabric module that connects the server's mapped PCIe devices is removed from the chassis. | Reinsert the X-Fabric module(s) into the chassis and then redeploy the server profile to remap the devices. | ||
| GraphicsCardTemperatureCritical | graphics.Card | Critical | The GPU has a critical temperature threshold condition. |
|
||
| GraphicsCardTemperatureWarning | graphics.Card | Warning | The GPU has a warning temperature threshold condition. |
|
||
| MemoryUnitBankError | memory.Unit | Warning | The memory unit has encountered a Bank Virtual lock step (VLS) error. |
|
||
| MemoryUnitBistError | memory.Unit | Critical | The memory unit has encountered a BIST error. |
Create a |
||
| MemoryUnitInvalidPopulation | memory.Unit | Critical | The DIMM slot has been invalidly populated. |
|
||
| MemoryUnitInvalidTypeError | memory.Unit | Critical | The memory unit type is invalid. |
Create a |
||
| MemoryUnitMismatchError | memory.Unit | Critical | A memory mismatch has been detected on this memory unit. |
Create a |
||
| MemoryUnitRankError | memory.Unit | Warning | The memory unit has encountered a Rank Virtual lock step (VLS) error. |
|
||
| MemoryUnitRasModeError | memory.Unit | Critical | The memory unit has encountered a RAS Mode error. |
|
||
| MemoryUnitSpdError | memory.Unit | Critical | The memory unit has encountered a SPD error. |
Create a |
||
| MemoryUnitTemperatureCritical | memory.Unit | Critical | The memory unit has a critical temperature threshold condition. |
|
||
| MemoryUnitTemperatureWarning | memory.UnitPSU | Warning | The memory unit has a warning temperature threshold condition. |
|
||
| MemoryUnitUncorrectableError | memory.Unit | Critical | The memory unit has encountered an uncorrectable ECC error. |
|
||
| NodeDcBrick1Fault | pci.Node | Critical | A fault has been detected on DC Brick 1. | Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC. | ||
| NodeDcBrick2Fault | pci.Node | Critical | A fault has been detected on DC Brick 2. | Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC. | ||
| NodeDetectedInInvalidSlot | pci.Node | Critical | PCIe node is populated in an invalid slot. | Move the PCIe node to a supported slot. | ||
| NodeHotSwapController1Fault | pci.Node | Critical | A fault has been detected on Hot Swap Controller 1. | Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC. | ||
| NodeHotSwapController2Fault | pci.Node | Critical | A fault has been detected on Hot Swap Controller 2. | Perform a PCIe node slot reset. If issue persists, physically reseat the PCIe node. If issue is still not resolved, contact Cisco TAC. | ||
| NodeIncompatibleXFM1Detected | pci.Node | Critical | XFM Module 1 is incompatible with the PCIe node. | Replace the XFM with a compatible XFM module. | ||
| NodeIncompatibleXFM2Detected | pci.Node | Critical | XFM Module 2 is incompatible with the PCIe node. | Replace the XFM with a compatible XFM module. | ||
| NodeMoved | pci.Node | Critical | This alarm occurs when a PCIe node has been moved from its discovered location to another location. | Perform a rediscover operation to remove node inventory from the old location and inventory it in the new location | ||
| NodeMovedAndReplaced | pci.Node | Critical | This alarm occurs when a PCIe node has been moved from its discovered location to another location, and another server or node is inserted into its original location. | Perform a rediscover operation to remove node inventory from the old location and inventory it in the new location | ||
| NodeRemoved | pci.Node | Critical | This alarm occurs when a discovered PCIe node is physically removed from its location. | Reinsert the node back into its slot or perform a remove operation to remove the node from the inventory. | ||
| NodeReplaced | pci.Node | Critical | This alarm occurs when a discovered PCIe node is physically removed from its location and another device is inserted in that location. | Perform a remove operation on the node to remove the old node from the inventory and inventory the new node. | ||
| NodeUnknownCardPresentInPCIeSlot1 | pci.Node | Warning | An unknown PCIe card is present in PCIe slot 1. | Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected. | ||
| NodeUnknownCardPresentInPCIeSlot2 | pci.Node | Warning | An unknown PCIe card is present in PCIe slot 2. | Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected. | ||
| NodeUnknownCardPresentInPCIeSlot3 | pci.Node | Warning | An unknown PCIe card is present in PCIe slot 3. | Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected. | ||
| NodeUnknownCardPresentInPCIeSlot4 | pci.Node | Warning | An unknown PCIe card is present in PCIe slot 4. | Check that the PCIe card is properly seated and that the power and MCIO cables are properly connected. | ||
| NodeUnsupportedCardPresentInPCIeSlot1 | pci.Node | Warning | A unsupported PCIe card is present in PCIe slot 1. | Install a supported PCIe card in PCIe slot 1. | ||
| NodeUnsupportedCardPresentInPCIeSlot2 | pci.Node | Warning | A unsupported PCIe card is present in PCIe slot 2. | Install a supported PCIe card in PCIe slot 2. | ||
| NodeUnsupportedCardPresentInPCIeSlot3 | pci.Node | Warning | A unsupported PCIe card is present in PCIe slot 3. | Install a supported PCIe card in PCIe slot 3. | ||
| NodeUnsupportedCardPresentInPCIeSlot4 | pci.Node | Warning | A unsupported PCIe card is present in PCIe slot 4. | Install a supported PCIe card in PCIe slot 4. | ||
|
PcieAuxPowerCableMissing |
equipment.SharedGraphicsCard |
Critical |
Auxiliary PCIe power cable not detected. |
|
||
| PcieAuxPowerCableMissing | equipment.SharedGraphicsCard | Critical | The auxiliary power cable for the PCIe card is not detected. | Ensure that the auxiliary power cable is properly connected. | ||
| PcieAuxPowerCableMissing | graphics.Card | Critical | The auxiliary power cable for the PCIe card is not detected. | Ensure that the auxiliary power cable is properly connected. | ||
| PcieMappedDeviceNotAvailable | compute.Blade | Critical | This alarm occurs when the Mapped PCIe devices are not available. | Check the mapped PCIe devices and review related alarms on the chassis and PCIe node for further details. | ||
| PcieSlotPowerFault | equipment.SharedGraphicsCard | Critical | A power fault has been detected on the PCIe slot. | Check PCIe card is properly seated and power and MCIO cables are installed properly. | ||
| PcieSlotPowerFault | graphics.Card | Critical | A power fault has been detected on the PCIe slot. | Check PCIe card is properly seated and power and MCIO cables are installed properly. | ||
|
PcieSlotPowerFault |
equipment.SharedGraphicsCard |
Critical |
A power fault has been detected on the PCIe slot. |
|
||
| PciNodePCIeLinkConfigIssue | pci.Node | Warning | PCIe link or port configuration issue detected. PCIe links may not be up or configured properly between PCIe slots and CPUs. |
|
||
| PciNodePowerFault | pci.Node | Critical | PCIe node power fault detected. |
|
||
| PciNodePresentXFM1Absent | pci.Node | Warning | PCIe node detected with missing XFM1. PCIe node cannot be fully managed without both XFMs being present. |
|
||
| PciNodeRemoved | compute.Blade | Critical | This alarm occurs when a PCIe node which hosts this server's mapped PCIe devices is removed from the chassis. | Reinsert the node back into the chassis and redeploy the server profile to remap the devices. Otherwise, undeploy the service profile to unmap the PCIe devices. | ||
| PciNodeRiser1Missing | pci.Node | Warning | The PCIe node Riser 1 is missing. No PCIe lanes to CPU1 can be utilized. |
|
||
| PciNodeRiser1PowerFault | pci.Node | Critical | PCIe node Riser 1 power fault detected. |
|
||
| PciNodeRiser2PowerFault | pci.Node | Critical | PCIe node Riser 2 power fault detected. |
|
||
| PciNodeRiser2PresentCPU2Absent | pci.Node | Warning | PCIe node Riser 2 is present, but CPU2 is absent. PCIe slots on Riser 2 are not connected. |
|
||
| PciNodeRiserMismatch | pci.Node | Warning | The PCIe node Riser type mismatch. Risers will remain powered off. |
|
||
| PciNodeUnknownPCIeCardPresentOnRiser1 | pci.Node | Warning | PCIe node has an unknown PCIe card present on Riser 1. Riser will remain powered off. |
|
||
| PciNodeUnknownPCIeCardPresentOnRiser2 | pci.Node | Warning | PCIe node has an unknown PCIe card present on Riser 2. Riser will remain powered off. |
|
||
| PciNodeUnsupportedPCIeCardPresentOnRiser1 | pci.Node | Warning | PCIe node has an unsupported PCIe card present on Riser 1. Riser will remain powered off. |
|
||
| PciNodeUnsupportedPCIeCardPresentOnRiser2 | pci.Node | Warning | PCIe node has an unsupported PCIe card present on Riser 2. Riser will remain powered off. |
|
||
| PeerNVLinkedGpuNotMapped | graphics.Card | Critical | This alarm occurs when NVLinked GPUs are not mapped to the same server. | Ensure NVLinked GPUs are mapped to the same server. | ||
| ProcessorUnitCatErr | processor.Unit | Critical | The processor has encountered a CATERR error. The system event log (SEL) contains events related to the processor's catastrophic error (CATERR) sensor. |
Create a |
||
| ProcessorUnitTemperatureCritical | processor.Unit | Critical | The processor has a critical temperature threshold condition. |
|
||
| ProcessorUnitTemperatureWarning | processor.Unit | Warning | The processor has a warning temperature threshold condition. |
|
||
| ProcessorUnitThermtrip | processor.Unit | Critical | The processor has encountered a THERMTRIP error. |
|
||
|
RackFanSpeedWarning |
equipment.Fan |
Warning |
The server fan has a warning speed threshold condition. |
|
||
|
RackPsuDetectionFailure |
equipment.Psu |
Critical |
The health state monitor detects a PSU failure. |
|
||
|
RackPsuOutputCurrentWarning |
equipment.Psu |
Warning |
PSU temperature above warning threshold. |
|
||
|
RackPsuOutputVoltageWarning |
equipment.Psu |
Warning |
PSU temperature above warning threshold. |
|
||
|
RackPsuPredictiveFailure |
equipment.Psu |
Critical |
The PSU is predicted to fail. |
|
||
|
RackPsuTemperatureWarning |
equipment.Psu |
Warning |
PSU temperature above warning threshold. |
|
||
|
ServerProfileStateOutOfSyncWarning |
server.profile |
Warning |
The server profile moved to Out-of-sync state. |
|
||
|
ServerProfileStatePendingChangesWarning |
server.profile |
Warning |
The server profile has moved to pending-changes state. |
Check the server policy configuration for Pending-changes and deploy the server profile again to apply the changes. |
||
| StorageControllerAuthFailure | storage.Controller | Critical | This alarm occurs when SPDM authentication fails for the storage controller. |
If you see this fault, take the following actions:
|
||
| StorageControllerFailed | storage.Controller | Critical | This alarm occurs when the storage controller is in failed state. | If the Storage controller is in failed state, create a show tech-support file and contact Cisco TAC to see if the controller needs replacement.
|
||
| StorageControllerFlashDegraded | storage.Controller | Critical | This alarm occurs when the storage controller is functional, but the on-board flash has degraded. |
If you see this fault, take the following action:
|
||
| StorageControllerFlashFailed | storage.Controller | Critical | This alarm occurs when the storage controller is functional but the on-board flash has failed. |
If the flash is in failed state, create a |
||
| StorageControllerForeignConfig | storage.Controller | Critical | This alarm occurs when foreign configurations are present in the physical drives attached to the storage controller. |
If you see this fault, take the following actions:
|
||
| StorageControllerInvalidConfiguration | storage.Controller | Critical | This alarm occurs when the storage controller contains invalid configuration. |
|
||
| StorageControllerInvalidFirmware | storage.Controller | Critical | This alarm occurs when the storage controller contains invalid firmware. |
|
||
| StorageControllerLostConfiguration | storage.Controller | Critical | This alarm occurs when the storage controller has lost its configuration data. |
When you replace a RAID controller, the RAID configuration that is stored in the controller is lost. Use this procedure to restore your RAID configuration to the new RAID Controller.
If the above actions do not resolve the issue, create a |
||
| StorageControllerUnresponsive | storage.Controller | Critical | This alarm occurs when contact with the storage controller is probably lost, and the storage controller has become unresponsive. | For PCI and mezz-based storage controllers, check the seating of the storage controller. If the problem persists, create a
show tech-support file and contact Cisco TAC to see if the controller needs replacement.
|
||
| StoragePhysicalDiskFailed | storage.PhysicalDisk | Critical | This alarm occurs when the storage physical disk is in failed state. | If the drive state is in failed state, create a show tech-support file and contact Cisco TAC to see if the disk needs to be replaced.
|
||
| StoragePhysicalDiskForeignConfig | storage.PhysicalDisk | Critical | This alarm occurs when the storage physical disk contains a foreign configuration. |
If you see this fault, take the following actions:
|
||
| StoragePhysicalDiskOffline | storage.PhysicalDisk | Critical | This alarm occurs when storage physical disk is in Offline state. |
If you see this fault, take the following actions:
|
||
| StoragePhysicalDiskPredictiveFailure | storage.PhysicalDisk | Critical | This alarm occurs when storage physical disk is in predictive failure state. | If the drive state is in predictive-failure state, create a show tech-support file and contact Cisco TAC to see if the disk needs to be replaced.
|
||
|
StoragePhysicalDiskReadyForRemoval |
storage.PhysicalDisk |
Informational (Info) |
The physical disk is in quiesced state and ready for removal. |
For further assistance, contact Cisco TAC. |
||
|
StoragePhysicalDiskRebuilding |
storage.PhysicalDisk |
Informational (Info) |
The physical disk is in rebuilding state. |
For further assistance, contact Cisco TAC. |
||
| StoragePhysicalDiskSelfTestFail | storage.PhysicalDisk | Critical | This alarm occurs when the self-test on a storage physical disk has failed. |
Create a |
||
| StoragePhysicalDiskUnConfiguredBad | storage.PhysicalDisk | Warning | This alarm occurs when the storage physical disk is in Unconfigured Bad state and is not available for RAID volume. |
If you see this fault, take the following actions:
|
||
| StorageRaidBatteryDegraded | storage.BatteryBackupUnit | Critical | This alarm occurs when the storage battery backup unit is in degraded state. |
If you see this fault, take the following actions:
|
||
|
StorageVirtualDriveCacheDegraded |
storage.VirtualDrive |
Warning |
Virtual drive cache is in degraded state. |
For further assistance, contact Cisco TAC. |
||
| StorageVirtualDriveDegraded | storage.VirtualDrive | Critical | This alarm occurs when the storage virtual drive is in degraded state. |
If you see this fault, take the following actions:
|
||
| StorageVirtualDriveOffline | storage.VirtualDrive | Critical | This alarm occurs when the storage virtual drive is in offline state. |
If you see this fault, take the following actions:
|
||
| StorageVirtualDrivePartiallyDegraded | storage.VirtualDrive | Critical | The storage virtual drive is partially degraded. The operating condition of the virtual drive is not optimal. |
If you see this fault, take the following actions:
|
Feedback