Server Components Alarms
Following table shows the description of the supported alarms for servers.
Name | MO | Severity | Explanation | Recommended Action |
---|---|---|---|---|
BladeMigrationDetected | compute.Blade | Critical | This alarm occurs when a server has been detected in a slot different than the one it was discovered in. |
|
PhysicalMissing | compute.Physical | Critical | This alarm occurs when a server has been removed from the slot it was discovered in. |
|
PhysicalWillBoot | compute.Physical | Critical | The UCS Will Boot is a cursory check to ensure that the blade is configured properly to allow the BIOS to proceed. This alarm indicates that a critical Will boot error is encountered on the server. This error occurs when the CPU and DIMM configuration check fails. |
|
BoardTemperatureWarning | compute.Board | Warning | The motherboard has a warning temperature threshold condition. |
|
BoardTemperatureCritical | compute.Board | Critical | The motherboard has a critical temperature threshold condition. |
|
BoardVoltageWarning | compute.Board | Warning | The motherboard has a warning voltage threshold condition. |
|
BoardVoltageCritical | compute.Board | Critical | The motherboard has a critical voltage threshold condition. |
|
BoardPower | compute.Board | Critical | The motherboard has a critical power problem. This occurs when the motherboard power consumption exceeds certain threshold limits. At that time the power usage sensors on a server detect a problem. |
|
ServerAdapterUnitDeprecated | compute.Physical | Critical | One or more adapters connected to the server are deprecated, or are not supported in the current Intersight release. |
|
RackUnitHealthWarning | compute.RackUnit | Warning | The server's health state has reached the warning threshold. |
|
RackUnitHealthCritical | compute.RackUnit | Critical | The server's health state has reached the critical threshold. |
|
PciNodeInsertedPowerOnRequired |
compute.Blade | Warning | This alarm occurs if PCIe node is inserted when the compute node is in powered off state. |
|
PciNodeRemovedPowerOnRequired |
compute.Blade | Warning | This alarm occurs if PCIe node is removed when the compute node is in powered off state. |
After removing the PCIe node, power on the PCIe node's paired compute node. |
PciNodeInsertedPowerCycleRequired |
compute.Blade | Warning | This alarm occurs if PCIe node is inserted when the compute node is in powered on state. |
|
PciNodeRemovedPowerCycleRequired | compute.Blade | Warning | This alarm occurs if PCIe node is removed when the compute node is in powered on state. |
|
PciNodeUnsupported | compute.Blade | Warning | Unsupported PCIe node detected. PCIe node will remain powered off. |
|
PciNodeUnidentified | compute.Blade | Warning | Unidentified PCIe node detected. PCIe node will remain powered off. |
|
HostEthInterfaceDown | adapter.HostEthInterface | Critical | The uplink interface is shut down, or a transient error caused the vNIC to fail. |
|
HostEthInterfaceStandByActive | adapter.HostEthInterface | Warning | The preferred path for the failover enabled vNIC is down and hence the secondary path is currently active. |
|
HostFcInterfaceDown | adapter.HostFcInterface | Critical | The uplink interface is shut down, or a transient error caused the vHBA to fail. |
|
NotReachable | adapter.Unit | Warning | Adapter is not reachable or the connectivity is not discovered from the Fabric Interconnects or FEX. |
|
CardTemperatureWarning | graphics.Card | Warning | The GPU has a warning temperature threshold condition. |
|
CardTemperatureCritical | graphics.Card | Critical | The GPU has a critical temperature threshold condition. |
|
UnitTemperatureWarning | memory.UnitPSU | Warning | The memory unit has a warning temperature threshold condition. |
|
UnitTemperatureCritical | memory.Unit | Critical | The memory unit has a critical temperature threshold condition. |
|
UnitUncorrectableError | memory.Unit | Critical | The memory unit has encountered an uncorrectable ECC error. |
|
UnitBankError | memory.Unit | Warning | The memory unit has encountered a Bank Virtual lock step (VLS) error. |
|
UnitRankError | memory.Unit | Warning | The memory unit has encountered a Rank Virtual lock step (VLS) error. |
|
UnitInvalidPopulation | memory.Unit | Critical | The DIMM slot has been invalidly populated. |
|
UnitRasModeError | memory.Unit | Critical | The memory unit has encountered a RAS Mode error. |
|
UnitMismatchError | memory.Unit | Critical | A memory mismatch has been detected on this memory unit. |
Create a |
UnitSpdError | memory.Unit | Critical | The memory unit has encountered a SPD error. |
Create a |
UnitBistError | memory.Unit | Critical | The memory unit has encountered a BIST error. |
Create a |
UnitInvalidTypeError | memory.Unit | Critical | The memory unit type is invalid. |
Create a |
UnitCatErr | processor.Unit | Critical | The processor has encountered a CATERR error. The system event log (SEL) contains events related to the processor's catastrophic error (CATERR) sensor. |
Create a |
UnitThermtrip | processor.Unit | Critical | The processor has encountered a THERMTRIP error. |
|
UnitTemperatureWarning | processor.Unit | Warning | The processor has a warning temperature threshold condition. |
|
UnitTemperatureCritical | processor.Unit | Critical | The processor has a critical temperature threshold condition. |
|
NodeRiser1Missing | pci.Node | Warning | The PCIe node Riser 1 is missing. No PCIe lanes to CPU1 can be utilized. |
|
NodeRiserMismatch | pci.Node | Warning | The PCIe node Riser type mismatch. Risers will remain powered off. |
|
NodeRiser2PresentCPU2Absent | pci.Node | Warning | PCIe node Riser 2 is present, but CPU2 is absent. PCIe slots on Riser 2 are not connected. |
|
NodePCIeLinkConfigIssue | pci.Node | Warning | PCIe link or port configuration issue detected. PCIe links may not be up or configured properly between PCIe slots and CPUs. |
|
NodeRiser1PowerFault | pci.Node | Critical | PCIe node Riser 1 power fault detected. |
|
NodeRiser2PowerFault | pci.Node | Critical | PCIe node Riser 2 power fault detected. |
|
NodePowerFault | pci.Node | Critical | PCIe node power fault detected. |
|
NodeUnsupportedPCIeCardPresentOnRiser1 | pci.Node | Warning | PCIe node has an unsupported PCIe card present on Riser 1. Riser will remain powered off. |
|
NodeUnsupportedPCIeCardPresentOnRiser2 | pci.Node | Warning | PCIe node has an unsupported PCIe card present on Riser 2. Riser will remain powered off. |
|
NodeUnknownPCIeCardPresentOnRiser1 | pci.Node | Warning | PCIe node has an unknown PCIe card present on Riser 1. Riser will remain powered off. |
|
NodeUnknownPCIeCardPresentOnRiser2 | pci.Node | Warning | PCIe node has an unknown PCIe card present on Riser 2. Riser will remain powered off. |
|
NodePresentXFM1Absent | pci.Node | Warning | PCIe node detected with missing XFM1. PCIe node cannot be fully managed without both XFMs being present. |
|
ControllerLostConfiguration | storage.Controller | Critical | This alarm occurs when the storage controller has lost its configuration data. |
When you replace a RAID controller, the RAID configuration that is stored in the controller is lost. Use this procedure to restore your RAID configuration to the new RAID Controller.
If the above actions do not resolve the issue, create a |
ControllerFailed | storage.Controller | Critical | This alarm occurs when the storage controller is in failed state. | If the Storage controller is in failed state, create a show tech-support file and contact Cisco TAC to see if the controller needs replacement.
|
ControllerFlashDegraded | storage.Controller | Critical | This alarm occurs when the storage controller is functional, but the on-board flash has degraded. |
If you see this fault, take the following action:
|
ControllerFlashFailed | storage.Controller | Critical | This alarm occurs when the storage controller is functional but the on-board flash has failed. |
If the flash is in failed state, create a |
ControllerInvalidFirmware | storage.Controller | Critical | This alarm occurs when the storage controller contains invalid firmware. |
|
ControllerAuthFailure | storage.Controller | Critical | This alarm occurs when SPDM authentication fails for the storage controller. |
If you see this fault, take the following actions:
|
ControllerInvalidConfiguration | storage.Controller | Critical | This alarm occurs when the storage controller contains invalid configuration. |
|
ControllerUnresponsive | storage.Controller | Critical | This alarm occurs when contact with the storage controller is probably lost, and the storage controller has become unresponsive. | For PCI and mezz-based storage controllers, check the seating of the storage controller. If the problem persists, create a
show tech-support file and contact Cisco TAC to see if the controller needs replacement.
|
ControllerForeignConfig | storage.Controller | Critical | This alarm occurs when foreign configurations are present in the physical drives attached to the storage controller. |
If you see this fault, take the following actions:
|
PhysicalDiskFailed | storage.PhysicalDisk | Critical | This alarm occurs when the storage physical disk is in failed state. | If the drive state is in failed state, create a show tech-support file and contact Cisco TAC to see if the disk needs to be replaced.
|
PhysicalDiskPredictiveFailure | storage.PhysicalDisk | Critical | This alarm occurs when storage physical disk is in predictive failure state. | If the drive state is in predictive-failure state, create a show tech-support file and contact Cisco TAC to see if the disk needs to be replaced.
|
PhysicalDiskOffline | storage.PhysicalDisk | Critical | This alarm occurs when storage physical disk is in Offline state. |
If you see this fault, take the following actions:
|
PhysicalDiskUnConfiguredBad | storage.PhysicalDisk | Warning | This alarm occurs when the storage physical disk is in Unconfigured Bad state and is not available for RAID volume. |
If you see this fault, take the following actions:
|
PhysicalDiskForeignConfig | storage.PhysicalDisk | Critical | This alarm occurs when the storage physical disk contains a foreign configuration. |
If you see this fault, take the following actions:
|
PhysicalDiskSelfTestFail | storage.PhysicalDisk | Critical | This alarm occurs when the self-test on a storage physical disk has failed. |
Create a |
VirtualDriveDegraded | storage.VirtualDrive | Critical | This alarm occurs when the storage virtual drive is in degraded state. |
If you see this fault, take the following actions:
|
VirtualDrivePartiallyDegraded | storage.VirtualDrive | Critical | The storage virtual drive is partially degraded. The operating condition of the virtual drive is not optimal. |
If you see this fault, take the following actions:
|
VirtualDriveOffline | storage.VirtualDrive | Critical | This alarm occurs when the storage virtual drive is in offline state. |
If you see this fault, take the following actions:
|
RaidBatteryDegraded | storage.BatteryBackupUnit | Critical | This alarm occurs when the storage battery backup unit is in degraded state. |
If you see this fault, take the following actions:
|
FruMissing | equipment.Fru | Critical | This alarm typically occurs when any hardware component is missing in a server, chassis, FEX or FI and the server or chassis is not rediscovered manually. |
If you see this fault, take the following actions:
|
FruReplaced | equipment.Fru | Critical | This alarm typically occurs when any adapter is replaced in a server and the server is not decommissioned and recommissioned. |
If you see this fault, take the following actions:
|
RackFanSpeedCritical | equipment.Fan | Critical | The server fan has a speed threshold condition. This fault typically occurs when a fan is running at a speed that is too slow or too fast. A malfunctioning fan can affect the operating temperature of the rack server. |
If you see this fault, take the following actions:
|
RackPsuInputLost | equipment.Psu | Warning | The power supply has no AC input. |
|
RackPsuTemperatureCritical | equipment.Psu | Critical | The power supply has a temperature threshold condition. |
|
RackPsuTemperatureWarning | equipment.Psu | Warning | The power supply has a temperature threshold condition. |
|
RackPsuOutputCurrentCritical | equipment.Psu | Critical | The power supply has a output current threshold condition. |
Create a |
RackPsuOutputCurrentWarning | equipment.Psu | Warning | The power supply has a output current threshold condition. |
Create a |
RackPsuOutputVoltageCritical | equipment.Psu | Critical | The power supply has an output voltage threshold condition. |
Create a |
RackPsuOutputVoltageWarning | equipment.Psu | Warning | The power supply has an output voltage threshold condition. |
Create a |
RackPsuOutputPowerCritical | equipment.Psu | Critical | The server power supply has an output power threshold condition. This fault occurs if the current output of the PSU in the rack server is far above or below the non-recoverable threshold value. |
Create a |
RackPsuOutputPowerWarning | equipment.Psu | Warning | The server power supply has an output power threshold condition. This fault occurs if the current output of the PSU in the rack server is far above or below the non-recoverable threshold value. |
Create a |
ServerProfileStateOutOfSyncWarning |
server.profile |
Warning |
The server profile moved to Out-of-sync state. |
|
ServerProfileStatePendingChangesWarning |
server.profile |
Warning |
The server profile has moved to pending-changes state. |
Check the server policy configuration for Pending-changes and deploy the server profile again to apply the changes. |
ComputeCimcFirmwareNotSupported |
compute.BladeIdentity |
Warning |
This fault indicates that one of the IO modules is missing. |
Intersight Managed Mode does not support the existing firmware version. Upgrade the server using the firmware upgrade option in the Chassis tab. |
ComputeServerNotConnected |
compute.BladeIdentity |
Warning |
Server discovery failed because the device is not connected. |
Server discovery failed because the device is not connected. For further assistance, contact Cisco TAC. |
ComputeServerDisconnected |
compute.Physical |
Warning |
Server is not reachable. |
If you see this alarm, take the following actions. Check the server's network connectivity. |
ComputePhysicalBiosPostTimeOut |
compute.Physical |
Critical |
This alarm typically occurs when the server has encountered a BIOS POST timeout. |
For further assistance, contact Cisco TAC. |
StoragePhysicalDiskReadyForRemoval |
storage.PhysicalDisk |
Informational (Info) |
The physical disk is in quiesced state and ready for removal. |
For further assistance, contact Cisco TAC. |
StoragePhysicalDiskRebuilding |
storage.PhysicalDisk |
Informational (Info) |
The physical disk is in rebuilding state. |
For further assistance, contact Cisco TAC. |
StorageVirtualDriveCacheDegraded |
storage.VirtualDrive |
Warning |
Virtual drive cache is in degraded state. |
For further assistance, contact Cisco TAC. |