In Cisco UCS, a fault is a mutable object that is managed by Cisco UCS Manager. Each fault represents a failure in the Cisco UCS instance or an alarm threshold that has been raised. During the life cycle of a fault, it can change from one state or severity to another.
Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, then the object transitions to a functional state.
A fault remains in Cisco UCS Manager until the fault is cleared and deleted according to the settings in the fault collection policy.
All Cisco UCS Manager faults are available through SNMP using the cucsFaultTable table, which contains one entry for every fault instance. Each entry has variables to indicate the nature of the problem, such as the severity and type. The same object is used to model all Cisco UCS fault types, including equipment issues (memory, CPU), FSM failures, configuration issues, environment issues (thermal, power), and connectivity issues. The cucsFaultTable table includes all active faults (faults that are raised and need user attention), as well as faults that have been cleared but not deleted yet because of the retention interval.
The cucsFaultTable table has the same information as the <faultInst> objects that can be queried through the XML API. In the Cisco UCS Manager GUI, the faults are available through the Admin tab under .
In Release 1.3 and later, Cisco UCS Manager sends a cucsFaultActiveNotif trap whenever a fault is raised in Cisco UCS Manager. As an exception to this rule, Cisco UCS Manager does not send traps for FSM faults. The trap variables indicate the nature of the problem, including the fault type, such as memory or configuration issue. Cisco UCS Manager sends a cucsFaultClearNotif trap whenever a fault has been cleared. A fault is cleared when the underlying issue has been resolved.
In Release 1.3, the cucsFaultActiveNotif and cucsFaultClearNotif traps are defined in the CISCO-UNIFIED-COMPUTING-MIB MIB.
In Release 1.4, 2.0 and later, the cucsFaultActiveNotif and cucsFaultClearNotif traps are defined in the CISCO-UNIFIED-COMPUTING-NOTIFS-MIB MIB. All faults can be polled using SNMP GET through cucsFaultTable, which is defined in the CISO-UNIFIED-COMPUTING-FAULT-MIB.
For more details on the Cisco UCS Manager Faults, see Cisco UCS Faults and Error Messages Reference.
The following table describes the attributes exposed by the cucsFaultTable.
Table 1 cucsFaultTable AttritubesAttribute
|
Description
|
Fault Instance ID(Table Index)
|
A unique integer that identifies the fault.
|
Affected Object DN
|
The distinguished name of the mutable object that has the fault.
|
Affected Object OID
|
The Object identifier (OID) of the mutable object that has the fault.
|
Creation Time
|
The time that the fault was created.
|
Last Modification
|
The time when any of the attributes were modified.
|
Code
|
A code that provides information specific to the nature of the fault.
|
Type
|
The fault type.
|
Cause
|
The probable cause of the fault.
|
Severity
|
The severity of the fault.
|
Occurrence
|
The number of times that a fault has occurred since it was created.
|
Description
|
A human readable string that provides all information related to the fault.
|