Configuring Settings for the Fault Collection Policy
Fault Collection Policy
The fault collection policy controls the lifecycle of a fault in a
Cisco UCS
instance, including when faults are cleared, the flapping interval (the length of time between the fault being raised and the condition being cleared), and the retention interval (the length of time a fault is retained in the system).
A fault in
Cisco UCS
has the following lifecycle:
A condition occurs in the system and
Cisco UCS Manager
raises a fault. This is the active state.
When the fault is alleviated, it is cleared if the time between the fault being raised and the condition being cleared is greater than the flapping interval, otherwise, the fault remains raised but its status changes to soaking-clear. Flapping occurs when a fault is
raised and cleared several times in rapid succession. During the flapping
interval the fault retains its severity for the length of time specified in the
fault collection policy.
If the condition reoccurs during the flapping interval, the fault
remains raised and its status changes to flapping. If the condition does not reoccur during the
flapping interval, the fault is cleared.
When a fault is cleared, it is deleted if the clear action is set to delete, or if the fault was previously acknowledged, otherwise, it is retained until either the retention interval expires, or if the fault is acknowledged.
If the condition reoccurs during the retention interval, the fault
returns to the active state. If the condition does not reoccur, the fault is
deleted.
Configuring the Fault Collection Policy
Procedure
Command or Action
Purpose
Step 1
UCS-A#
scope monitoring
Enters monitoring mode.
Step 2
UCS-A /monitoring #
scope fault policy
Enters monitoring fault policy mode.
Step 3
UCS-A /monitoring/fault-policy #
set clear-action {deleteretain}
Specifies whether to retain or delete all cleared messages. If the retain option is specified, then the length of time that the messages are retained is determined by the set retention-interval command.
Step 4
UCS-A /monitoring/fault-policy #
set flap-intervalseconds
Specifies the time interval (in seconds) the system waits before changing a fault state. Flapping occurs when a fault is raised and cleared several times in rapid succession. To prevent this, the system does not allow a fault to change state until the flapping interval has elapsed after the last state change. If the fault is raised again during the flapping interval, it returns to the active state, otherwise, the fault is cleared.
Step 5
UCS-A /monitoring/fault-policy #
set retention-interval
{dayshoursminutesseconds | forever}
Specifies the time interval the system retains all cleared fault messages before deleting them. The system can retain cleared fault messages forever, or for the specified number of days, hours, minutes, and seconds.
Step 6
UCS-A /monitoring/fault-policy #
commit-buffer
Commits the transaction.
The following example configures the fault collection policy to retain cleared fault messages for 30 days, set the flapping interval to 10 seconds, and commits the transaction.
Cisco UCS Manager
uses the Core File Exporter to export core files as soon as they occur to a
specified location on the network through TPTP. This functionality allows you
to export the tar file with the contents of the core file.
Enables the core file exporter. When the core file exporter is enabled and an error causes the server to perform a core dump, the system exports the core file via FTP to the specified remote server.
Step 4
UCS-A /monitoring/sysdebug #
set core-export-target pathpath
Specifies the path to use when exporting the core file to the remote server.
Step 5
UCS-A /monitoring/sysdebug #
set core-export-target portport-num
Specifies the port number to use when exporting the core file via TFTP. The range of valid values is 1 to 65,535.
Step 6
UCS-A /monitoring/sysdebug #
set core-export-target server-descriptiondescription
Provides a description for the remote server used to store the core file.
Step 7
UCS-A /monitoring/sysdebug #
set core-export-target server-namehostname
Specifies the hostname of the remote server to connect with via TFTP.
Step 8
UCS-A /monitoring/sysdebug #
commit-buffer
Commits the transaction.
The following example enables the core file exporter, specifies the path and port to use when sending the core file, specifies the remote server hostname, provides a description for the remote server, and commits the transaction.
UCS-A# scope monitoring
UCS-A /monitoring # scope sysdebug
UCS-A /monitoring/sysdebug # enable core-export-target
UCS-A /monitoring/sysdebug* # set core-export-target path /root/CoreFiles/core
UCS-A /monitoring/sysdebug* # set core-export-target port 45000
UCS-A /monitoring/sysdebug* # set core-export-target server-description CoreFile102.168.10.10
UCS-A /monitoring/sysdebug* # set core-export-target server-name 192.168.10.10
UCS-A /monitoring/sysdebug* # commit-buffer
UCS-A /monitoring/sysdebug #