The system generates alert messages to notify the administrator when a predefined condition is met, such as when an activated service goes from up to down. The system can send alerts as email or epage.
RTMT, which supports alert defining, setting, and viewing, contains preconfigured and user-defined alerts. Although you can perform configuration tasks for both types, you cannot delete preconfigured alerts (whereas you can add and delete user-defined alerts). The Alert menu comprises the following menu options:
Alert Central—This option comprises the history and current status of every alert in the system.
You can also access Alert Central by clicking the Alert Central icon in the hierarchy tree in the system drawer.
Set Alert/Properties—This menu category allows you to set alerts and alert properties.
Remove Alert—This menu category allows you to remove an alert.
Enable Alert—With this menu category, you can enable alerts.
Disable Alert—You can disable an alert with this category.
Suspend cluster/node Alerts—This menu category allows you to temporarily suspend alerts on a particular server or on an entire cluster (if applicable).
Clear Alerts—This menu category allows you to reset an alert (change the color of an alert item to black) to signal that an alert has been handled. After an alert has been raised, its color will automatically change in RTMT and will stay that way until you manually clear the alert.
The manual clear alert action does not update the System cleared timestamp column in Alert Central. This column is updated only if alert condition is automatically cleared.
Clear All Alerts—This menu category allows you to clear all alerts.
Reset all Alerts to Default Config—This menu category allows you to reset all the alerts to the default configuration.
Alert Detail—This menu category provides detailed information on alert events.
Config Email Server—In this category, you can configure your email server to enable alerts.
Config Alert Action—This category allows you to set actions to take for specific alerts; you can configure the actions to send the alerts to desired email recipients.
In RTMT, you configure alert notification for perfmon counter value thresholds and set alert properties for the alert, such as the threshold, duration, frequency, and so on. RTMT predefined alerts are configured for perfom counter value thresholds as wells as event (alarms) notifications.
You can locate Alert Central under the Tools hierarchy tree in the quick launch. Alert Central provides both the current status and the history of all the alerts in the system.
You can configure both preconfigured and user-defined alerts in RTMT. You can also disable both preconfigured and user-defined alerts in RTMT. You can add and delete user-defined alerts in the performance-monitoring window; however, you cannot delete preconfigured alerts.
Severity levels for Syslog entries match the severity level for all RTMT alerts. If RTMT issues a critical alert, the corresponding Syslog entry also specifies critical.
The following table provides a list of fields that you may use to configure each alert; users can configure preconfigured fields, unless otherwise noted.
Table 1 Alert customization
High-level name of the monitoring item with which RTMT associates an alert
Some preconfigured alerts allow you to initiate a trace download based on the occurrence of an event. You can automatically capture traces when a particular event occurs by checking the Enable Trace Download check box in Set Alert/Properties for the following alerts:
CriticalServiceDown–CriticalServiceDown alert gets generated when any service is down.
The RTMT backend service checks status (by default) every 30 seconds. If service goes down and comes back up within that period, CriticalServiceDown alert may not get generated.
CriticalServiceDown alert monitors only those services that are listed in RTMT Critical Services.
CoreDumpFileFound–CoreDumpFileFound alert gets generated when RTMT backend service detects a new Core Dump file.
You can configure both CriticalServiceDown and CoreDumpFileFound alerts to download corresponding trace files for troubleshooting purposes. This helps preserve trace files at the time of crash.
Enabling Trace Download may affect services on the server. Configuring a high number of downloads will adversely impact the quality of services on the server.
The alert log stores the alert, which is also stored in memory. The memory gets cleared at a constant interval, leaving the last 30 minutes of data in the memory. When the service starts/restarts, the last 30 minutes of the alert data load into the memory by the system reading from the alert logs on the server or on all servers in the cluster (if applicable). The alert data in the memory gets sent to the RTMT clients on request.
Upon RTMT startup, RTMT shows all logs that occurred in the last 30 minutes in the Alert Central log history. Alert log periodically gets updated, and new logs get inserted into the log history window. After the number of logs reaches 100, RTMT removes the oldest 40 logs.
The following file name format for the alert log applies: AlertLog_MM_DD_YYYY_hh_mm.csv.
The alert log includes the following attributes:
Time Stamp—Time when RTMT logs the data
Alert Name—Descriptive name of the alert
Node—Server name for where RTMT raised the alert
Alert Message—Detailed description about the alert
Type—Type of the alert
Description—Description of the monitored object
Severity—Severity of the alert
PollValue—Value of the monitored object where the alert condition occurred
Action—Alert action taken
Group ID—Identifies the source of the alert
The first line of each log file comprises the header. Details of each alert get written in a single line, separated by a comma.
Log Partition Monitoring, which is installed automatically with the system, uses configurable thresholds to monitor the disk usage of the log partition on a server. The Cisco Log Partition Monitoring Tool service starts automatically after installation of the system.
Every 5 minutes, Log Partition Monitoring uses the following configured thresholds to monitor the disk usage of the log partition and the spare log partition on a server:
LogPartitionLowWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends out an alarm message to syslog and an alert to RTMT Alert central. To save the log files and regain disk space, you can use trace and log central option in RTMT.
LogPartitionHighWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends an alarm message to syslog and an alert to RTMT Alert central.
SparePartitionLowWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends out an alarm message to syslog and an alert to RTMT Alert central. To save the log files and regain disk space, you can use trace and log central option in RTMT.
SparePartitionHighWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends an alarm message to syslog and an alert to RTMT Alert central.
In addition, Cisco Log Partitioning Monitoring Tool service checks the server every 5 seconds for newly created core dump files. If new core dump files exist, Cisco Log Partitioning Monitoring Tool service sends a CoreDumpFileFound alarm and an alert to Alert Central with information on each new core file.
To utilize log partition monitor, verify that the Cisco Log Partitioning Monitoring Tool service, a network service, is running on Cisco Unified Serviceability on the server or on each server in the cluster (if applicable). Stopping the service causes a loss of feature functionality.
When the log partition monitoring services starts at system startup, the service checks the current disk space utilization. If the percentage of disk usage is above the low water mark, but less than the high water mark, the service sends a alarm message to syslog and generates a corresponding alert in RTMT Alert central.
To configure Log Partitioning Monitoring, set the alert properties for the LogPartitionLowWaterMarkExceeded and LogPartitionHighWaterMarkExceeded alerts in Alert Central. For more information, see Set alert properties.
To offload the log files and regain disk space on the server, you should collect the traces that you are interested in saving by using the Real-Time Monitoring tool.
If the percentage of disk usage is above the high water mark that you configured, the system sends an alarm message to syslog, generates a corresponding alert in RTMT Alert Central, and automatically purges log files until the value reaches the low water mark.
Log Partition Monitoring automatically identifies the common partition that contains an active directory and inactive directory. The active directory contains the log files for the current installed version of the software (Unified CCX), and the inactive directory contains the log files for the previous installed version of the software. If necessary, the service deletes log files in the inactive directory first. The service then deletes log files in the active directory, starting with the oldest log file for every application until the disk space percentage drops below the configured low water mark. The service does not send an email when log partition monitoring purges the log files.
After the system determines the disk usage and performs the necessary tasks (sending alarms, generating alerts, or purging logs), log partition monitoring occurs at regular 5 minute intervals.