Cisco Internet Streamer CDS 2.4 Alarms and Error Messages Guide
Alarms
Downloads: This chapterpdf (PDF - 116.0KB) The complete bookPDF (PDF - 6.54MB) | Feedback

Alarms

Table Of Contents

Alarms

Severity Level

Critical Alarms

Major Alarms

Minor Alarms


Alarms


This chapter lists the Cisco Internet Streamer CDS Release 2.4 alarms. Each alarm is followed by an explanation and recommended action.

Severity Level

An alarm can have one of the following three severity levels: critical, major, or minor:

A critical alarm indicates that a critical problem exists somewhere in the network. Critical alarms cause failover and should be cleared immediately.

A major alarm indicates that a serious problem exists that is disrupting service. Major alarms differ from critical alarms in that they do not cause failovers. Major alarms should also be cleared immediately.

Minor alarms should be noted and cleared as soon as possible.

Critical Alarms

Alarm 330001 (svcdisabled) - service name - service has been disabled.

Explanation    The Node Manager tried restarting the specified service but the service kept restarting. The number of restarts has exceeded an internal limit and the service has been disabled.

Alarm 330002 (servicedead) - service name - service died.

Explanation    A critical service has died. Attempts are made to restart this service, but the device may run in a degraded state.

Recommended Action    The device could reboot itself to avoid instability. Examine the syslog for messages relating to the cause of service death.

Alarm 335000 (alarm_overload) Alarm Overload State has been entered.

Explanation    The Node Health Manager issues this alarm to indicate that the device is raising alarms at a rate that exceeds the overload threshold.

Recommended Action    Access the device and determine what services are raising the alarms. Take corrective action to resolve the individual services' issues.

Alarm 335001 (keepalive) Keepalive failure for - application name - . Timeout = n seconds.

Explanation    An application is not being responsive, indicating it may not be properly operating.

Recommended Action    Access the device and determine what state the specific application is in.

Alarm 335003 (test1) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. this alarm should never occur during normal operation.

Alarm 335006 (test4) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335008 (test1) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 445002 (disk_smartfailcrit) An SE disk has severe early-prediction failure which requires immediate action.

Explanation    The System Monitor issues this alarm to indicate that one of the disks attached to the SE has severe early-prediction failure (for example, the disk has failed SMART self-check).

Recommended Action    Back up data immediately on the disk to prevent data loss, and replace the disk after it is marked bad by the SE.

Alarm 445005 (disk_softraidcrit) A SoftRAID device has malfunctioned and requires immediate action.

Explanation    The System Monitor issues this alarm to indicate that a SoftRAID device has malfunctioned (for example, both component disks of a RAID-1 array have become inaccessible or faulty).

Recommended Action    Replace the disks and restore the data from backup storage, or remanufacturing and reload the disks.

Major Alarms

Alarm 100002 (ManifestFetchFail) Fail to fetch manifest file for Delivery Service.

Explanation    There is a problem fetching the manifest file for this delivery service.

Recommended Action    Log in to the Content Acquirer, execute the show stat acq err command to check the problem, and resolve the problem.

Alarm 100003 (ManifestParseFail) Fail to parse manifest file for Delivery Service.

Explanation    There are some syntax errors in the manifest file for this delivery service.

Recommended Action    Log in to the Content Acquirer, execute the show stat acq err command to check the problem, and resolve the problem.

Alarm 100005 (ExceedQuota) Total content size could not fit into the Delivery Service disk quota.

Explanation    The total content size acquired for this delivery service is larger than allowed from the delivery service disk quota.

Recommended Action    Either remove some contents from the manifest file, or increase the delivery service disk quota.

Alarm 100006 (CrawlStartUrlFail) The start-url for a crawl job in the Delivery Service failed.

Explanation    There is a problem fetching the start URL of a crawl job in this delivery service.

Recommended Action    Log in to the Content Acquirer, and execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 100007 (ContentFail) There are some contents that failed to be acquired.

Explanation    There are some contents that failed to be acquired.

Recommended Action    Log in to the Content Acquirer, execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 213501 (svcnomcastenable) Alarm multicast is disabled although the SE is a multicast sender and receiver, or it is subscribed to a multicast Delivery Service.

Explanation    The unicast data receiver issues this alarm to indicate that the device does not have multicast service enabled, although it is expected to be involved in multicast distribution.

Recommended Action    Enable the multicast license and service on the device.

Alarm 330003 (servicedead) - service name - service died.

Explanation    The node manager found the specified service to be dead. Attempts are made to restart this service.

Recommended Action    Examine the syslog for messages relating to the cause of service death. The alarm is cleared if the service stays alive and does not restart soon.

Alarm 335002 (test) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335004 (test2) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335009 (test2) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335010 (test3) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 445001 (core_dump) A User Core file or Kernel Crash dump has been generated.

Explanation    The System Monitor issues this alarm to indicate that one or more of the software modules or the kernel has generated core files.

Recommended Action    Access the device and check the directory /local1/core_dir, or /local1/crash, retrieve the core file through FTP, and contact Cisco TAC.

Alarm 445003 (disk_smartfailmajor) An SE disk has early-prediction failure.

Explanation    The System Monitor issues this alarm to indicate that one of the disks attached to the SE has early-prediction failure. This alarm indicates the disk could fail in the near future.

Recommended Action    Make proper preparations for the incoming disk drive failure, such as making data backups and preparing a replacement disk.

Alarm 520004 (GroupDown) - group - Specified standby group is down.

Explanation    None of the member interfaces' in the specified standby group could be brought up.

Recommended Action    Check the member interfaces configuration and cabling.

Alarm 540002 (linkdown) Network interface is inactive or down.

Explanation    The network interface is inactive or down.

Recommended Action    Check the cables connected to the network device.

Alarm 661001 (svclowdisk) Alarm database is running low in disk space in the STATEFS partition.

Explanation    The database monitor service issues this alarm to indicate that it is running low in disk space in the STATEFS partition, and therefore content replication service (acquisition and distribution) has been temporarily stopped.

Recommended Action    Execute the cms database maintenance command or schedule database maintenance more frequently to reclaim the disk space.

Alarm 700002 (cms_clock_alarm) The device clock is not synchronized with the primary CDSM. Enabling NTP on all the devices is strongly recommended.

Explanation    If this device is an SE, its clock must be synchronized with the primary CDSM to make replication status, statistics monitoring, and program files work correctly. If this device is a standby CDSM, its clock must be synchronized with the primary CDSM to make the CDSM failover work.

Recommended Action    Fix the clock on either this device or the primary CDSM.

Minor Alarms

Alarm 100001 (zerobandwidth) specified content acquisition bandwidth is 0.

Explanation    The device has been assigned as Content Acquirer for some delivery services, but its acquisition bandwidth is 0.

Recommended Action    On the CDSM, Devices page, select this device and select Edit and the Select Preposition link on the left of the screen, then change its default bandwidth.

Alarm 100004 (ManifestUpdateFail) Fail to recheck manifest file for Delivery Service.

Explanation    There is a problem rechecking the manifest file for this delivery service.

Recommended Action    Log in to the Content Acquirer, execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 100008 (ContentUpdateFail) There are some contents that failed to be rechecked.

Explanation    There are some contents that failed to be rechecked.

Recommended Action    Log in to the Content Acquirer, and execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 100009 (ManifestParseWarning) Fail to parse manifest file for Delivery Service.

Explanation    There are some syntax warnings in the manifest file for this delivery service.

Recommended Action    Log in to the Content Acquirer, and execute the show status acquirer error command to display the warnings, and resolve the problem.

Alarm 212500 (svcbwclosed) Alarm Dout bandwidth is set to zero while jobs are scheduled.

Explanation    The unicast data sender issued this alarm to indicate that the Dout is scheduled to be zero, but currently the unicast data sender has a job running.

Recommended Action    Access the CDSM and determine if the bandwidth values and bandwidth schedules are correctly configured, and verify on the device the effective bandwidth and job statistics.

Alarm 213500 (svcbwclosed) Alarm Din bandwidth is set to zero while jobs are scheduled.

Explanation    The unicast data receiver issued this alarm to indicate that the Din is scheduled to be zero, but currently the unicast data receiver has a job scheduled or running.

Recommended Action    Access the CDSM and determine if the bandwidth values and bandwidth schedules are correctly configured, and verify on the device the effective bandwidth and job statistics.

Alarm 213502 (svcnomcastconnectivity) There is no multicast network connectivity between the multicast sender and this device.

The unicast data receiver issues this alarm to indicate that the device as multicast receiver cannot receive Pragmatic General Multicast packets from a multicast sender. There is no multicast network connectivity between the multicast sender and this device.

Recommended Action    Check and fix the multicast network connectivity between the sender and the receiver.

Alarm 213503 (svcunsspaceproblem) There is a unified name space problem while replicating and so some NACKs are suppressed.

Explanation    The unicast data receiver issues this alarm to indicate that the device as multicast receiver cannot receive files due to a problem with UNS. It stops sending NACKs for the UNS failed files.

Recommended Action    Check and fix the UNS-related issues in the multicast receiver SE.

Alarm 213504 (svcnacksuppressed) Alarm that Multicast Receiver has stopped NACKs due to heavy loss.

Explanation    The unicast data receiver issues this alarm to indicate that the device as multicast receiver cannot receive multicast files for some considerable time and has stopped sending NACKs for the files.

Recommended Action    Check the multicast network for any problems. The sending of NACKs starts after at least one file is successfully received.

Alarm 215003 (svcdevfailover) Alarm backup multicast sender has been activated.

Explanation    The backup multicast sender issues this alarm to indicate that it has been activated and either the primary backup sender has a problem, or the primary and backup multicast senders cannot communicate with each other due to possible network connection issues.

Recommended Action    Troubleshoot the multicast sender service on the primary multicast sender and check the network connectivity between the primary and backup multicast senders.

Alarm 215500 (svcbwclosed) Alarm Mout bandwidth is set to zero while jobs are scheduled.

Explanation    The multicast data sender issues this alarm to indicate that the device has Mout scheduled to be zero, but currently the multicast data sender has a job scheduled or is running.

Recommended Action    Access the CDSM and determine if the bandwidth values and bandwidth schedules are correctly configured, and verify on the device the effective bandwidth and job statistics.

Alarm 330004 (servicedead) - service name - service died.

Explanation    The node manager found the specified service to be dead. Attempts are made to restart this service.

Recommended Action    Examine the syslog for messages relating to the cause of service death. The alarm is cleared if the service stays alive and does not restart in a short while.

Alarm 335005 (test3) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335007 (test5) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 400000 (wesvcthresholdexceeded) WebEngine has reached service threshold limits.

Explanation    WebEngine service has reached license limits, or the limits were configured with the webengine max-concurrent-sessions command.

Recommended Action    Avoid further service requests to this device.

Alarm 445000 (disk_failure) An SE disk has failed.

Explanation    The System Monitor issues this alarm to indicate that one of the disks attached to the SE is not responding.

Recommended Action    Access the device and execute the show disk details command. If the problem persists, replace the disk.

Recommended Action    Watch the disk for early indication of errors. If more severe SMART errors or disk errors appear, take action accordingly.

Alarm 445005 (disk_softraidcrit) A SoftRAID device has malfunctioned and requires immediate action.

Explanation    The System Monitor issues this alarm to indicate that a SoftRAID device has malfunctioned (for example, f both component disks of a RAID-1 array have become inaccessible or faulty).

Recommended Action    Replace the disks and restore data from backup storage, or remanufacture and reload the disks.

Alarm 445006 (disk_softraidminor) A SoftRAID device has become degraded and requires immediate action.

Explanation    The System Monitor issues this alarm to indicate that a SoftRAID device has become degraded (for example, one disk of a RAID-1 array has become inaccessible or faulty).

Recommended Action    Ensure there is a current data backup, replace the faulty disk, and then reconstruct the RAID array.

Alarm 511010 (svcthresholdexceeded) WMT has reached service threshold limits.

Explanation    Windows media technologies service has reached license limits, or the limits are configured with the wmt max-concurrent-sessions bandwidth wmt outgoing command.

Recommended Action    Avoid further service requests to this device.

Alarm 511011 (fmsthresholdexceeded) FMS has reached service threshold limits.

Explanation    Flash Media Streaming service has reached concurrent connection limits.

Recommended Action    Avoid further service requests to this device, or contact Cisco TAC for more connection licenses.

Alarm 511012 (mssvcthresholdexceeded) Movie Streamer has reached service threshold limits.

Explanation    Movie Streamer service has reach license limits, or the limits are configured.

Recommended Action    Avoid further service requests to the device.

Alarm 520001 (LinkDown) -group-ifc-slot-port- Specified interface in the standby group is down.

Explanation    The specified interface in the standby group is down. There could have been a link failure on the interface or it may have been shut down on purpose.

Recommended Action    Check the configuration and cabling of the specified interface.

Alarm 520002 (RouteDown) -group-ifc-slot-port- Unable to reach the configured default gateway on the specified interface.

Explanation    Unable to reach the configured default gateway on the specified interface in the standby group.

Recommended Action    Check the network configuration on the specified interface.

Alarm 520003 (MaxError) -group-ifc-slot-port- The specified interface has seen errors exceeding maximum allowable error count.

Explanation    The specified interface has seen errors exceeding the maximum allowable error count.

Recommended Action    Check the cabling or configuration of the specified interface.

Alarm 540001 (shutdown) Network interface is shutdown.

Explanation    The network interface is shut down.

Recommended Action    Check the interface configuration.

Alarm 700001 (cms_test_alarm) CMS test alarm with instance value - instance was raised. The title is used in the CDSM GUI.

Explanation    This is a test alarm defined and used in CMS code. This alarm is identified by a tuple (340001, instance). This means the system may have several raised alarms with the 340001 ID having different instance values. Instance is usually used to link an alarm to a particular data item (such as a particular failed disk, or a delivery service having A&D troubles).

Recommended Action    Advise the user how to handle this raised alarm. This is shown in the CDSM GUI or command-line interface (CLI).

Recommended Action    Restart the Remote execution agent by using the CLI.

Alarm 1000010 (ManifestEmptyContent) Parsed Manifest file does not have any items to process.

Explanation    There are no single or crawl items mentioned in the manifest file to process.

Recommended Action    Edit the manifest file of this delivery service to have one or more items to process.