This document discusses Cisco Real-Time Monitoring Tool (RTMT) alerts and demonstrates how to troubleshoot some commonly seen alerts.
Cisco recommends that you have knowledge of Cisco Call Manager Web Administration.
The information in this document is based on the Cisco CallManager Server 11.0.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
The RTMT which runs as a client-side application uses HTTPS and TCP in order to monitor system performance, device status, device discovery, Computer Telephony Integration (CTI) applications, and voice messaging ports. RTMT can be used to configure alerts for the cluster it is monitoring.
The system generates alert messages in order to notify the administrator when a predefined condition is met, such as when an activated service goes from up to down. The system can send alerts as e-mail/e-page.
RTMT, which supports alert defining, setting, and viewing, contains preconfigured and user-defined alerts. Although you can perform configuration tasks for both types, you cannot delete preconfigured alerts.
Unified RTMT displays both preconfigured alerts and custom alerts in Alert Central as shown in the image.
You can also access Alert Central by clicking the Alert Central icon in the hierarchy tree in the system drawer.
Unified RTMT organizes the alerts under the applicable tabs: System, CallManager, Cisco Unity Connection, and Custom.
You can enable or disable preconfigured and custom alerts in Alert Central; however, you cannot delete preconfigured alerts.
For swap partitions, this alert indicates that the swap partition is left with low available space, and is heavily used by the system. The swap partition is normally used to extend the physical RAM capacity when needed. Under normal conditions, if RAM is enough, swap should not be used too much.
Also, these may be throw up RTMT alerts caused by a build-up of temp files, a reboot of the server is recommended to clear out any unnecessary temp files.
LogPartitionHighWaterMarkExceeded and LogPartitionLowWaterMarkExceeded
On running show status on the CLI of a CUCM server, a value that specifies the occupied and free percentage of logging partition in the CUCM disk space is shown. Also known as common partition, these values specify the space occupied by the logs/traces and the CDR files in the server, which even though are harmless, might cause issues in install/upgrade procedure due to lack of space over time. These alerts serve as a warning to the administrator to clear those logs which might have accumulated over time in the cluster/server.
LogPartitionLowWaterMarkExceeded: This alert is generated when the filled space reaches the threshold values configured for the alert. This alert serves as a pre-check indicator for the disk usage.
LogPartitionHighWaterMarkExceeded: This alert is generated when the filled space reaches the threshold values configured for the alert. Once the alert is generated, the server starts to auto-purge the oldest logs in order to bring down the space to value lesses that the HighWaterMark threshold.
Best practice would be to purge the logs manually as soon as LogPartitionLowWaterMarkExceeded alert is received.
Steps to do so are:
Step 1. Launch RTMT.
Step 2. Select Alert Central, then perform these tasks:
Select LogPartitionHighWaterMarkExceeded, note its value and change its threshold value to 60%.
Select LogPartitionLowWaterMarkExceeded, note its value and change its threshold value to 50%.
Polling occurs every 5 minutes, so wait for 5-10 minutes, then verify that the required disk space is available. If you want to free up more disk space in the common partition, change LogPartitionHighWaterMarkExceeded and LogPartitionLowWaterMarkExceeded thread values to lower values (for example, 30% and 20%) again.
Give it 15 to 20 minutes to clear the space in common partition. You can monitor the decrease in disk usage with the command show status from CLI.
That would bring down the common partition.
CpuPegging alert monitors CPU usage on the basis of configured threshold.
When the CPU pegging alert is received, the process that occupies the highest CPU can be occupied by going to the System Drawer on the left, that is Process.
From the CLI of the concerned server, these outputs will lend some insight.
utils diagnose test
show process load cpu sorted
utils core active list
It is recommended to observe if the CPU spike happens at a specific time or randomly. If it occurs randomly then the required detailed CUCM traces as well as RisDC perfmon logs to check what is triggering the spike in CPU. If the alerts are happening at a specific time of the day then it could be due to some scheduled activity like Disaster Recovery System (DRS) backup, CDR Load etc.
Also, on the basis of information about which process occupies the most CPU, specific logs are taken for further investigation. For eg. if the culprit is Tomcat, then the Tomcat related logs are needed.
Use this section in order to confirm that your configuration works properly.
If the alerts don't get dismissed after you follow the workarounds suggested here, or if the alerts seem to have immediate impact on service, contact Cisco TAC with the necessary details about the call manager version, number of nodes in the cluster, time and duration of the alert and the required process narrowing in case of CPU pegging.
There is currently no specific troubleshooting information available for this configuration.