The system generates alert messages to notify the
administrator when a predefined condition is met, such as when an activated
service goes from up to down. The system can send alerts as email or epage.
RTMT, which supports alert defining, setting, and viewing,
contains preconfigured and user-defined alerts. Although you can perform
configuration tasks for both types, you cannot delete preconfigured alerts
(whereas you can add and delete user-defined alerts). The Alert menu comprises
the following menu options:
Alert Central—This option comprises the history and current status
of every alert in the system.
Note
You can also access Alert Central by clicking the Alert Central
icon in the hierarchy tree in the system drawer.
Set Alert/Properties—This menu category allows you to set alerts
and alert properties.
Remove Alert—This menu category allows you to remove an alert.
Enable Alert—With this menu category, you can enable alerts.
Disable Alert—You can disable an alert with this category.
Suspend cluster/node Alerts—This menu category allows you to
temporarily suspend alerts on a particular server or on an entire cluster (if
applicable).
Clear Alerts—This menu category allows you to reset an alert
(change the color of an alert item to black) to signal that an alert has been
handled. After an alert has been raised, its color will automatically change in
RTMT and will stay that way until you manually clear the alert.
Note
The manual clear alert action does not update the System cleared
timestamp column in Alert Central. This column is updated only if alert
condition is automatically cleared.
Clear All Alerts—This menu category allows you to clear all
alerts.
Reset all Alerts to Default Config—This menu category allows you
to reset all the alerts to the default configuration.
Alert Detail—This menu category provides detailed information on
alert events.
Config Email Server—In this category, you can configure your
email server to enable alerts.
Config Alert Action—This category allows you to set actions to
take for specific alerts; you can configure the actions to send the alerts to
desired email recipients.
In RTMT, you configure alert notification for perfmon counter
value thresholds and set alert properties for the alert, such as the threshold,
duration, frequency, and so on. RTMT predefined alerts are configured for
perfom counter value thresholds as wells as event (alarms) notifications.
You can locate Alert Central under the Tools hierarchy tree
in the quick launch. Alert Central provides both the current status and the
history of all the alerts in the system.
RTMT displays both preconfigured alerts and custom alerts in Alert Central. RTMT organizes the alerts under the applicable tabs—System, Unified CCX, and Custom.
You can enable or disable preconfigured and custom alerts in Alert Central; however, you cannot delete preconfigured alerts.
You can configure both preconfigured and user-defined alerts
in RTMT. You can also disable both preconfigured and user-defined alerts in
RTMT. You can add and delete user-defined alerts in the performance-monitoring
window; however, you cannot delete preconfigured alerts.
Note
Severity levels for Syslog entries match the severity level for all
RTMT alerts. If RTMT issues a critical alert, the corresponding Syslog entry
also specifies critical.
The following table provides a list of fields that you may use to
configure each alert; users can configure preconfigured fields, unless
otherwise noted.
Table 1 Alert customization
Field
Description
Comment
Alert Name
High-level name of the monitoring item with which RTMT
associates an alert
Descriptive name. For preconfigured alerts, you cannot change
this field. For a list of preconfigured alerts, see Preconfigured and custom alerts.
Description
Description of the alert
You cannot edit this field for preconfigured alerts. For a
list of preconfigured alerts, see
Preconfigured and custom alerts.
Performance Counter(s)
Source of the performance counter
You cannot change this field. You can associate only one
instance of the performance counter with an alert.
Threshold
Condition to raise alert (value is...)
Specify up < - > down, less than #, %, rate greater than
#, %, rate. This field is applicable only for alerts based on performance
counters.
Value Calculated As
Method used to check the threshold condition
Specify value to be evaluated as absolute, delta (present -
previous), or % delta. This field is applicable only for alerts based on
performance counters.
Duration
Condition to raise alert (how long value threshold has to
persist before raising alert)
Options include the system sending the alert immediately or
after a specified time that the alert has persisted. This field is applicable
only for alerts based on performance counters.
Number of Events Threshold
Raise alert only when a configurable number of events exceeds
a configurable time interval (in minutes).
This field is applicable only for event based alerts.
Node IDs
Cluster or list of servers to monitor
Alert Action ID
ID of alert action to take (System always logs alerts no
matter what the alert action.)
Alert action gets defined first (see Alert fields).
If this field is blank, that indicates that email is disabled.
Enable Alerts
Enable or disable alerts.
Options include enabled or disabled.
Clear Alert
Resets alert (change the color of an alert item from to black)
to signal that the alert has been resolved
After an alert has been raised, its color will automatically
change to and stay that way until you manually clear the alert. Use Clear All
to clear all alerts.
Alert Details
Displays the detail of an alert (not configurable)
Alert Generation Rate
How often to generate alert when alert condition persists
Specify every X minutes. (Raise alert once every X minutes if
condition persists.)
Specify every X minutes up to Y times. (Raise alert Y times
every X minutes if condition persists.)
User Provide Text
Administrator to append text on top of predefined alert text
N/A
Severity
For viewing purposes (for example, show only Sev. 1 alerts)
Specify defaults that are provided for predefined (for
example, Error, Warning, Information) alerts.
In RTMT, you can configure alert actions for every alert
that is generated and have the alert action sent to email recipients that you
specify in the alert action list.
The following table provides a list of fields that you will use to
configure alert actions. Users can configure all fields, unless otherwise
marked.
Table 2 Alert action configuration
Field
Description
Comment
Alert Action ID
ID of alert action to take
Specify descriptive name.
Mail Recipients
List of email addresses. You can selectively enable/disable an
individual email in the list.
Some preconfigured alerts allow you to initiate a trace
download based on the occurrence of an event. You can automatically capture
traces when a particular event occurs by checking the Enable Trace Download
check box in Set Alert/Properties for the following alerts:
CriticalServiceDown–CriticalServiceDown alert gets generated when
any service is down.
Note
The RTMT backend service checks status (by default) every 30
seconds. If service goes down and comes back up within that period,
CriticalServiceDown alert may not get generated.
Note
CriticalServiceDown alert monitors only those services that are
listed in RTMT Critical Services.
CoreDumpFileFound–CoreDumpFileFound alert gets generated when RTMT
backend service detects a new Core Dump file.
Note
You can configure both CriticalServiceDown and CoreDumpFileFound
alerts to download corresponding trace files for troubleshooting purposes. This
helps preserve trace files at the time of crash.
Caution
Enabling Trace Download may affect services on the server.
Configuring a high number of downloads will adversely impact the quality of
services on the server.
The alert log stores the alert, which is also stored in
memory. The memory gets cleared at a constant interval, leaving the last 30
minutes of data in the memory. When the service starts/restarts, the last 30
minutes of the alert data load into the memory by the system reading from the
alert logs on the server or on all servers in the cluster (if applicable). The
alert data in the memory gets sent to the RTMT clients on request.
Upon RTMT startup, RTMT shows all logs that occurred in the
last 30 minutes in the Alert Central log history. Alert log periodically gets
updated, and new logs get inserted into the log history window. After the
number of logs reaches 100, RTMT removes the oldest 40 logs.
The following file name format for the alert log applies:
AlertLog_MM_DD_YYYY_hh_mm.csv.
The alert log includes the following attributes:
Time Stamp—Time when RTMT logs the data
Alert Name—Descriptive name of the alert
Node—Server name for where RTMT raised the alert
Alert Message—Detailed description about the alert
Type—Type of the alert
Description—Description of the monitored object
Severity—Severity of the alert
PollValue—Value of the monitored object where the alert condition
occurred
Action—Alert action taken
Group ID—Identifies the source of the alert
The first line of each log file comprises the header.
Details of each alert get written in a single line, separated by a comma.
Log Partition Monitoring, which is installed automatically with the system, uses configurable thresholds to monitor the disk usage of the log partition on a server. The Cisco Log Partition Monitoring Tool service starts automatically after installation of the system.
Every 5 minutes, Log Partition Monitoring uses the following configured thresholds to monitor the disk usage of the log partition and the spare log partition on a server:
LogPartitionLowWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends out an alarm message to syslog and an alert to RTMT Alert central. To save the log files and regain disk space, you can use trace and log central option in RTMT.
LogPartitionHighWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends an alarm message to syslog and an alert to RTMT Alert central.
SparePartitionLowWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends out an alarm message to syslog and an alert to RTMT Alert central. To save the log files and regain disk space, you can use trace and log central option in RTMT.
SparePartitionHighWaterMarkExceeded (% disk space)—When the disk usage is above the percentage that you specify, LPM sends an alarm message to syslog and an alert to RTMT Alert central.
In addition, Cisco Log Partitioning Monitoring Tool service checks the server every 5 seconds for newly created core dump files. If new core dump files exist, Cisco Log Partitioning Monitoring Tool service sends a CoreDumpFileFound alarm and an alert to Alert Central with information on each new core file.
To utilize log partition monitor, verify that the Cisco Log Partitioning Monitoring Tool service, a network service, is running on Cisco Unified Serviceability on the server or on each server in the cluster (if applicable). Stopping the service causes a loss of feature functionality.
When the log partition monitoring services starts at system startup, the service checks the current disk space utilization. If the percentage of disk usage is above the low water mark, but less than the high water mark, the service sends a alarm message to syslog and generates a corresponding alert in RTMT Alert central.
To configure Log Partitioning Monitoring, set the alert properties for the LogPartitionLowWaterMarkExceeded and LogPartitionHighWaterMarkExceeded alerts in Alert Central. For more information, see Set alert properties.
To offload the log files and regain disk space on the server, you should collect the traces that you are interested in saving by using the Real-Time Monitoring tool.
If the percentage of disk usage is above the high water mark that you configured, the system sends an alarm message to syslog, generates a corresponding alert in RTMT Alert Central, and automatically purges log files until the value reaches the low water mark.
Note
Log Partition Monitoring automatically identifies the common partition that contains an active directory and inactive directory. The active directory contains the log files for the current installed version of the software (Unified CCX), and the inactive directory contains the log files for the previous installed version of the software. If necessary, the service deletes log files in the inactive directory first. The service then deletes log files in the active directory, starting with the oldest log file for every application until the disk space percentage drops below the configured low water mark. The service does not send an email when log partition monitoring purges the log files.
After the system determines the disk usage and performs the necessary tasks (sending alarms, generating alerts, or purging logs), log partition monitoring occurs at regular 5 minute intervals.