Cisco ACNS Software Alarms and Error Messages Guide, Release 5.5.x
Chapter 2: Alarms
Downloads: This chapterpdf (PDF - 191.0KB) The complete bookPDF (PDF - 8.0MB) | Feedback

Alarms

Table Of Contents

Alarms

Critical Alarms

Major Alarms

Minor Alarms


Alarms


This chapter lists the ACNS Release 5.5 alarms. Each alarm is followed by an explanation and a recommended action.

Critical Alarms

/alm/crit/NHM/alarm_overload: Alarm Overload State has been entered.

Explanation    The overload alarm rate has exceeded the specified threshold. The Node Health Manager issues this alarm.

Recommended Action    Access the device, and determine the services that are raising the alarms. Take corrective action to resolve the service issues.

Useful commands:

show alarms status
show alarms
show alarms history

/alm/crit/NHM/-application name-/keepalive: Keepalive failure for -application 
name-. Timeout = n seconds.

Explanation    An application has not issued a keepalive to the Node Health Manager for the last n seconds, indicating that the application could be running in a degraded state. The Node Health Manager issues this alarm.

Recommended Action    Access the device and determine the state of the application. Take corrective action to resolve the issues that are causing the application to degrade.

Useful commands:

show alarms
show alarms history
(commands specific to the application name)

/alm/crit/NHM/instance%d/test1: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/maj/NHM/test: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/crit/NHM/test4: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/crit/NHM2/instance%d/test1: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/crit/nodemgr/-service name-/servicedead: -service name- service died.

Explanation    A critical service failed. The node manager will attempt to restart the service, but the device may run in a degraded state. The device could reboot itself to avoid instability.

Recommended Action    Examine the syslog for messages relating to the cause of service failure.

/alm/crit/nodemgr/-service name-/svcdisabled: -service name- service has been 
disabled.

Explanation    The node manager tried restarting the specified service, but the service kept failing. The number of restarts exceeded the internal limit, and the service was disabled.

Recommended Action    Reload the device to reenable the service.

/alm/crit/sysmon/diskXX/disk_failure: A CE disk has failed.

Explanation    One of the disks attached to the CE has a severe error. The System Monitor issues this alarm.

Recommended Action    Access the device, and execute show disk details command. If problem persists, replace the disk.

/alm/crit/sysmon/diskXX/disk_SMARTinfo: A CE disk has severe early-prediction 
failure which requires immediate action.

Explanation    One of the disks attached to CE has severe early-prediction failure. For example, the disk has failed SMART self-check. The System Monitor issues this alarm.

Recommended Action    Immediately make a backup of the data in the disk. Restore the data when the disk error is rectified.

Major Alarms

/alm/maj/acquirer/%s/ContentFail: There are some contents failed to be acquired

Explanation    The root Content Engine failed to acquire some contents.

Recommended Action    Log in to the root Content Engine, use the show stat acq err command to find out the cause of the problem, and resolve the problem.

/alm/maj/acquirer/%s/CrawlStartUrlFail: The start-url for a crawl job in the 
channel failed

Explanation    The root Content Engine could not fetch the start-url of a crawl job in this channel.

Recommended Action    Log in to the root Content Engine, use the show stat acq err command to find out the cause of the problem, and resolve the problem.

/alm/maj/acquirer/%s/ExceedChannelQuota: Total content size could not fit into 
channel disk quota.

Explanation    The total size of the content that the root Content Engine needs to acquire for this channel is bigger than the size allowed by the channel disk quota.

Recommended Action    Remove some contents from manifest file, or increase channel disk quota.

/alm/maj/acquirer/%s/ManifestFetchFail: Fail to fetch manifest file for channel

Explanation    The root Content Engine could not fetch the manifest file for this channel.

Recommended Action    Log in to the root Content Engine, use the show stat acq err command to find out the cause of the problem, and resolve the problem.

/alm/maj/acquirer/%s/ManifestParseFail: Fail to parse manifest file for channel

Explanation    The manifest file for this channel contains syntax errors.

Recommended Action    Log in to the root Content Engine, use the show stat acq err command to find out the cause of the problem, and resolve the problem.

/alm/maj/AD_DATABASE/svclowdisk: Alarm database is running low in disk space in 
STATEFS partition

Explanation    The database monitor service issues this alarm to indicate that it is running low in disk space in STATEFS partition and hence content replication service (acquisition and distribution) has been temporarily stopped.

Recommended Action    Run the cms database maintenance command or schedule database maintenance more frequently to reclaim the disk space.

Useful commands:

#cms database maintenance
(config)#cms database maintenance

/alm/maj/NHM/instance%d/test2: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. Should not occur during normal operation.

/alm/maj/NHM2/instance%d/test2: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/maj/nodemgr/-service_name-/servicedead: -service name- service died.

Explanation    A major service failed. The service manager will attempt to restart the service. The alarm will be cleared if the service does not restart in a short while.

Recommended Action    Examine the syslog for messages relating to the cause of the service failure.

/alm/maj/sysmon/core%02d/core_dump: An user Core file or Kernel Crash dump has 
been generated.

Explanation    One or more of the software modules or the kernel has generated core files. The System Monitor issues this alarm.

Recommended Action    Access the device and check the directory /local1/core_dir, or /local1/crash, retrieve the core file through ftp and contact Cisco TAC.

/alm/maj/sysmon/diskXX/disk_SMARTinfo: A CE disk has early-prediction failure.

Explanation    One of the disks attached to the Content Engine has early-prediction failure, indicating the disk could fail in the near future. The System Monitor issues this alarm.

Recommended Action    Make a backup of the data in the disk, and prepare a replacement disk.

Minor Alarms

/alm/min/acquirer/%s/ContentUpdateFail: There are some contents failed to be 
re-checked

Explanation    The root Content Engine failed to recheck some of the contents.

Recommended Action    Log in to the root Content Engine, use the show stat acq err command to find out the cause of the problem, and resolve the problem.

/alm/min/acquirer/%s/ManifestParseWarning: Fail to parse manifest file for 
channel

Explanation    The manifest file for this channel contains syntax errors.

Recommended Action    Log in to the root Content Engine, use the show stat acq err command to find out the cause of the problem, and resolve the problem.

/alm/min/acquirer/%s/ManifestUpdateFail: Fail to re-check manifest file for 
channel

Explanation    The root Content Engine could not re-check the manifest file for this channel.

Recommended Action    Log in to the root Content Engine, use the show stat acq err command to find out the cause of the problem, and resolve the problem.

/alm/min/acquirer/zerobandwidth: specified content acquisition bandwidth is 0

Explanation    The device has been assigned as root Content Engine for some channels, but its acquisition bandwidth is 0.

Recommended Action    Change the default acquisition bandwidth of the device. To change the bandwidth in the CDM GUI, use the Devices page, select the device > Edit > Preposition, and change the default bandwidth of the device.

/alm/min/cms/cms_clock_alarm: Device clock is not synchronized with the primary 
CDM.

Explanation    The device clock needs to be synchronized with the primary CDM clock. The device in question could be a CE or a standby CDM. If the device is a Content Engine, the clock synchronization would be required to make replication status, statistics monitoring, and program file work correctly. If the device is a standby CDM, its clock need to be synchronized with the primary CDM to make the CDM failover work.

Recommended Action    Fix the clock on the device or the primary CDM.

/alm/min/cms/-instance-/cms_test_alarm: CMS test alarm with instance value 
-instance- was raised. Title is used in CDM UI.

Explanation    This test alarm is defined and used in CMS code. This alarm is identified by the tuple (340001, instance) which means that the system has raised several alarms. These alarms, while having the same ID, namely 340001, have different instance values. An instance is used to link the alarm to a particular data item (that is, a disk failure, a channel having A&D troubles, and so on).

Recommended Action    The message displayed on the CDM GUI or the CLI would tells you the action to be taken.

/alm/minor/MULTICAST_DATA_RECEIVER/svcbwclosed: Alarm backup multicast sender has 
been activated

Explanation    The backup multicast sender issues this alarm to indicate that it has been activated and that either the primary backup sender has problem or the primary and backup multicast senders cannot communicate with each other, possibly due to network connection issues.

Recommended Action    Troubleshoot the multicast sender service on the primary multicast sender and also check the network connectivity between the primary and the backup multicast sender.

Useful commands:

show multicast
show distribution mcast-data-sender

show statistics distribution mcast-data-sender

/alm/minor/MULTICAST_DATA_SENDER/svcbwclosed: Alarm Mout bandwidth is set to zero 
while jobs are scheduled

Explanation    The multicast data sender issues this alarm to indicate that the device has Mout scheduled to be zero but currently multicast data sender has job scheduled or running.

Recommended Action    Access the CDM, determine if the bandwidth values and bandwidth schedules are correctly configured, and verify the effective bandwidth and job statistics on the device.

Useful commands:

show distribution
show distribution channels
show statistics distribution mcast-data-sender

/alm/min/NHM/instance%d/test3: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/min/NHM/test5: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/min/NHM2/instance%d/test3: NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should not occur during normal operation.

/alm/min/nodemgr/-service_name-/servicedead: -service name- service died.

Explanation    A minor service failed. The node manager will attempt to restart this service. The alarm will be cleared if the service does not restart in a short while.

Recommended Action    Examine the syslog for messages relating to the cause of the service failure.

/alm/min/standby/grp-group-/groupdown: -group- Specified standbygroup is down.

Explanation    None of the member interfaces in the specified standby group could be brought up.

Recommended Action    Check the configuration and cabling of the member interfaces.

/alm/min/standby/grp-group-/-ifc-slot-port-/linkdown: -group-ifc-slot-port- 
Specified interface in the standbygroup is down.

Explanation    The specified interface in the standby group is down. There could be a link failure on the interface, or it may have been shut down on purpose.

Recommended Action    Check configuration and cabling of the specified interface.

/alm/min/standby/grp-group-/-ifc-slot-port-/maxerror: -group-ifc-slot-port- The 
specified interface has seen errors exceeding maximum allowable error count.

Explanation    The specified interface encountered errors that exceed the maximum allowable error count.

Recommended Action    Check the cabling or configuration of the specified interface.

/alm/min/standby/grp-group-/-ifc-slot-port-/routedown: -group-ifc-slot-port- 
Unable to reach the configured default gateway on the specified interface.

Explanation    The system could not reach the configured default gateway on the specified interface in the standby group.

Recommended Action    Check the network configuration on the specified interface.

/alm/min/sysmon/diskXX/disk_SMARTinfo: A CE disk has minor early-prediction 
failure.

Explanation    One of the disks attached to the CE has a minor early-prediction failure, so the disk might fail sometime in the future. The System Monitor issues this alarm.

Recommended Action    Look for indications of severe errors, take a backup of the disk, and prepare a replacement disk.

/alm/minor/UNICAST_DATA_RECEIVER/svcbwclosed: Alarm Din bandwidth is set to zero 
while jobs are scheduled

Explanation    The unicast data receiver issues this alarm to indicate that the device has Din bandwidth set to zero but the unicast data receiver has a job scheduled or running.

Recommended Action    Access the CDM, determine if the bandwidth values and bandwidth schedules are correctly configured, and verify the effective bandwidth and job statistics on the device.

Useful commands:

show distribution
show distribution channels
show statistics distribution unicast-data-receiver

/alm/minor/UNICAST_DATA_RECEIVER/svcnomcastconnectivity: Alarm there is no 
multicast network connectivity between the multicast sender and this device.

Explanation    The unicast data receiver issues this alarm to indicate that the device, as multicast receiver, cannot receive pgm multicast packet from multicast sender. There is no multicast network connectivity between the multicast sender and this device.

Recommended Action    Check and fix the multicast network connectivity between the sender and the receiver.

Useful command:

multicast connectivity-test

/alm/minor/UNICAST_DATA_RECEIVER/svcnomcastenable: Alarm multicast is disabled 
although the CE is a multicast sender and/or receiver or it is subscribed to 
multicast channels.

Explanation    The unicast data receiver issues this alarm to indicate that the device does not have multicast service enabled, although it is expected to be involved in multicast distribution.

Recommended Action    Enable the multicast license and the service on the device.

Useful commands:

(config)#multicast accept-license-agreement
(config)#multicast evaluate
(config)#multicast license-key
(config)#multicast enable

/alm/minor/UNICAST_DATA_SENDER/svcbwclosed: Alarm Dout bandwidth is set to zero 
while jobs are scheduled

Explanation    The unicast data sender issues this alarm to indicate that the Dout bandwidth is set to zero but the unicast data sender has a job running.

Recommended Action    Access the CDM, determine if the bandwidth values and bandwidth schedules are correctly configured, and verify the effective bandwidth and job statistics on the device.

Useful commands:

show distribution
show distribution channels
show statistics distribution unicast-data-sender