Cisco Internet Streamer CDS 2.5 Alarms and Error Messages Guide
Alarms
Downloads: This chapterpdf (PDF - 156.0KB) The complete bookPDF (PDF - 4.02MB) | Feedback

Alarms

Table Of Contents

Alarms

Severity Level

Critical Alarms

Major Alarms

Minor Alarms

SNMP Alarm Traps


Alarms


This chapter lists the Cisco Internet Streamer CDS Release 2.5 alarms. Each alarm is followed by an explanation and recommended action. The chapter also defines the six generic SNMP alarm traps.

Severity Level

An alarm can have one of the following three severity levels: critical, major, or minor:

Critical alarm indicates that a critical problem exists somewhere in the network. Critical alarms cause failover and should be cleared immediately.

Major alarm indicates that a serious problem exists that is disrupting service. Major alarms differ from critical alarms in that they do not cause failovers. Major alarms should also be cleared immediately.

Minor alarms should be noted and cleared as soon as possible.

Critical Alarms

Alarm 330001 (svcdisabled) - service name - service has been disabled.

Explanation    The Node Manager tried restarting the specified service but the service kept restarting. The number of restarts has exceeded an internal limit and the service has been disabled.

Recommended Action    The device may have to be reloaded for the service to be re-enabled.

Alarm 330002 (servicedead) - service name - service died.

Explanation    A critical service has died. Attempts are made to restart this service, but the device may run in a degraded state.

Recommended Action    The device could reboot itself to avoid instability. Examine the syslog for messages relating to the cause of service death.

Alarm 335000 (alarm_overload) Alarm Overload State has been entered.

Explanation    The Node Health Manager issues this alarm to indicate that the device is raising alarms at a rate that exceeds the overload threshold.

Recommended Action    Access the device and determine what services are raising the alarms. Take corrective action to resolve the individual services' issues.

Alarm 335001 (keepalive) Keepalive failure for - application name - Timeout = n seconds.

Explanation    An application is not being responsive, indicating it may not be properly operating.

Recommended Action    Access the device and determine the state of the specific application.

Alarm 335003 (test1) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335006 (test4) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335008 (test1) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 445002 (disk_smartfailcrit) An SE disk has severe early-prediction failure which requires immediate action.

Explanation    The SYSMON issues this alarm to indicate that one of the disks attached to the SE has severe early-prediction failure (for example, the disk has failed SMART self-check).

Recommended Action    Back up data immediately on the disk to prevent data loss, and replace the disk after it is marked bad by the SE.

Alarm 445005 (disk_softraidcrit) A SoftRAID device has malfunctioned and requires immediate action.

Explanation    The SYSMON issues this alarm to indicate that a SoftRAID device has malfunctioned (for example, both component disks of a RAID-1 array have become inaccessible or faulty).

Recommended Action    Replace the disks and restore the data from backup storage, or remanufacturing and reload the disks.

Alarm 700004 (device_offline_alarm) Device is offline. Re-register device to cdsm is strongly recommended.

Explanation    The device is offline.

Recommended Action    Check the device or network status. It may be necessary to re-register the device to the CDSM.

Alarm 700005(rep_status_failed) Replication status failed.

Explanation    Replication status has failed.

Recommended Action    Check all SEs assigned to the delivery service.

Major Alarms

Alarm 100002 (ManifestFetchFail) Fail to fetch manifest file for Delivery Service.

Explanation    There is a problem fetching the manifest file for this delivery service.

Recommended Action    Log in to the CA, execute the show stat acq err command to check the problem, and resolve the problem.

Alarm 100003 (ManifestParseFail) Fail to parse manifest file for Delivery Service.

Explanation    There are some syntax errors in the manifest file for this delivery service.

Recommended Action    Log in to the CA, execute the show stat acq err command to check the problem, and resolve the problem.

Alarm 100005 (ExceedQuota) Total content size could not fit into the Delivery Service disk quota.

Explanation    The total content size acquired for this delivery service is larger than allowed from the delivery service disk quota.

Recommended Action    Either remove some contents from the manifest file, or increase the delivery service disk quota.

Alarm 100006 (CrawlStartUrlFail) The start-url for a crawl job in the Delivery Service failed.

Explanation    There is a problem fetching the start URL of a crawl job in this delivery service.

Recommended Action    Log in to the CA, execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 100007 (ContentFail) There are some contents that failed to be acquired.

Explanation    There are some contents that failed to be acquired.

Recommended Action    Log in to the CA, execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 213501 (svcnomcastenable) Alarm multicast is disabled although the SE is a multicast sender and receiver, or it is subscribed to a multicast Delivery Service.

Explanation    The unicast data receiver issues this alarm to indicate that the device does not have multicast service enabled, although it is expected to be involved in multicast distribution.

Recommended Action    Enable the multicast license and service on the device.

Alarm 330003 (servicedead) - service name - service died.

Explanation    The node manager found the specified service to be dead. Attempts are made to restart this service.

Recommended Action    Examine the syslog for messages relating to the cause of service death. The alarm is cleared if the service stays alive and does not restart soon.

Alarm 335002 (test) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335004 (test2) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335009 (test2) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 445001 (core_dump) A User Core file or Kernel Crash dump has been generated.

Explanation    The SYSMON issues this alarm to indicate that one or more of the software modules or the kernel has generated core files.

Recommended Action    Access the device and check the directory /local1/core_dir, or /local1/crash, retrieve the core file through FTP, and contact Cisco Technical Assistance Center (TAC).

Alarm 445003 (disk_smartfailmajor) An SE disk has early-prediction failure.

Explanation    The SYSMON issues this alarm to indicate that one of the disks attached to the SE has early-prediction failure. This alarm indicates the disk could fail in the near future.

Recommended Action    Make proper preparations for the incoming disk drive failure, such as making data backups and preparing a replacement disk.

Alarm 445010 (local) Directory /local1 usage exceeds the threshold.


Note This alarm is only on the Cisco Internet Streamer CDS 2.5.7 Release software and later.


Explanation    This directory runs out of space at 80%. If this directory runs out of space, some applications will not work properly. Clean up the files under /local1 now, otherwise the system will automatically delete log files to save space.

Recommended Action    Clean up the files under /local1 to save space.

Alarm 520004 (GroupDown) - group - Specified standby group is down.

Explanation    None of the member interfaces in the specified standby group could be brought up.

Recommended Action    Check the member interfaces configuration and cabling.

Alarm 540002 (linkdown) Network interface is inactive or down.

Explanation    The network interface is inactive or down.

Recommended Action    Check the cables connected to the network device.

Alarm 540003 (speed_mismatch) An alarm is raised for a portchannel if an interface within a portchannel has a different negotiated data rate than the rest of the interfaces in the portchannel.

Explanation    Speed mismatch among interfaces assigned to portchannel.

Recommended Action    Check the switch settings and verify cables are connected.

Alarm 550001 (SEKeepalive) SE keepalive timed-out or SE is not reachable.

Explanation    Either the SR has not received keepalives from the SE, or the SE is not reachable.

Recommended Action    Check the cables connected to the network device and the SE.

Alarm 560001 (threshold) Service monitor Cpu threshold exceeded.

Explanation    The Service Monitor CPU threshold has been exceeded.

Recommended Action    Check the file /tmp/threshold_exceeded.txt.

Alarm 560002 (threshold) Service monitor memory threshold exceeded.

Explanation    The Service Monitor memory threshold has been exceeded.

Recommended Action    Check the file /tmp/threshold_exceeded.txt.

Alarm 560003 (threshold) Service monitor kernel memory threshold exceeded.

Explanation    The Service Monitor kernel memory threshold has been exceeded.

Recommended Action    Check the file /tmp/threshold_exceeded.txt.

Alarm 560004 (threshold) Service monitor NIC threshold exceeded.

Explanation    The Service Monitor Network Interface Card (NIC) threshold has been exceeded.

Recommended Action    Check the file /tmp/threshold_exceeded.txt.

Alarm 560005 (threshold) Service monitor Disk threshold exceeded.

Explanation    The Service Monitor disk threshold has been exceeded.

Recommended Action    Check the file /tmp/threshold_exceeded.txt.

Alarm 560006 (threshold) Service monitor Disk Failure count threshold exceeded.

Explanation    The Service Monitor disk failure count threshold has been exceeded.

Recommended Action    Check the file /tmp/threshold_exceeded.txt.

Alarm 661001 (svclowdisk) Alarm database is running low in disk space in the STATEFS partition.

Explanation    The database monitor service issues this alarm to indicate that it is running low in disk space in the STATEFS partition, and therefore content replication service (acquisition and distribution) has been temporarily stopped.

Recommended Action    Execute the cms database maintenance command or schedule database maintenance more frequently to reclaim the disk space.

Alarm 700002 (cms_clock_alarm) The device clock is not synchronized with the primary Content Delivery System Manager (CDSM). Enabling NTP on all the devices is strongly recommended.

Explanation    If this device is an SE, its clock must be synchronized with the primary CDSM to make replication status, statistics monitoring, and program files work correctly. If this device is a standby CDSM, its clock must be synchronized with the primary CDSM to make the CDSM failover work.

Recommended Action    Fix the clock on either this device or the primary CDSM.

Alarm 850001 (cdnfs_db_corrupt) The total cached entries is more than the total number of CDNFS entries, which is an inconsistent state in the system.

Explanation    The UNS journal file used for bookkeeping content information on the disks has been corrupted.

Recommended Action    Execute the cdnfs database recover command after consulting Cisco TAC, and reload the server to ensure consistency.

Minor Alarms

Alarm 100001 (zerobandwidth) specified content acquisition bandwidth is 0.

Explanation    The device has been assigned as CA for some delivery services, but its acquisition bandwidth is 0.

Recommended Action    On the CDSM devices page, select this device and select Edit and the Select Preposition link on the left of the screen, then change its default bandwidth.

Alarm 100004 (ManifestUpdateFail) Fail to recheck manifest file for Delivery Service.

Explanation    There is a problem rechecking the manifest file for this delivery service.

Recommended Action    Log in to the CA, execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 100008 (ContentUpdateFail) There are some contents that failed to be rechecked.

Explanation    There are some contents that failed to be rechecked.

Recommended Action    Log in to the CA, and execute the show status acquirer error command to check the problem, and resolve the problem.

Alarm 100009 (ManifestParseWarning) Fail to parse manifest file for Delivery Service.

Explanation    There are some syntax warnings in the manifest file for this delivery service.

Recommended Action    Log in to the CA, and execute the show status acquirer error command to display the warnings, and resolve the problem.

Alarm 212500 (svcbwclosed) Alarm Dout bandwidth is set to zero while jobs are scheduled.

Explanation    The unicast data sender issued this alarm to indicate that the Dout is scheduled to be zero, but currently the unicast data sender has a job running.

Recommended Action    Access the CDSM and determine if the bandwidth values and bandwidth schedules are correctly configured, and verify on the device the effective bandwidth and job statistics.

Alarm 213500 (svcbwclosed) Alarm Din bandwidth is set to zero while jobs are scheduled.

Explanation    The unicast data receiver issued this alarm to indicate that the Din is scheduled to be zero, but currently the unicast data receiver has a job scheduled or running.

Recommended Action    Access the CDSM and determine if the bandwidth values and bandwidth schedules are correctly configured, and verify on the device the effective bandwidth and job statistics.

Alarm 213502 (svcnomcastconnectivity) There is no multicast network connectivity between the multicast sender and this device.

Explanation    The unicast data receiver issues this alarm to indicate that the device as multicast receiver cannot receive Pragmatic General Multicast packets from a multicast sender. There is no multicast network connectivity between the multicast sender and this device.

Recommended Action    Check and fix the multicast network connectivity between the sender and the receiver.

Alarm 213503 (svcunsspaceproblem) There is a UNS problem while replicating and so some NACKs are suppressed.

Explanation    The unicast data receiver issues this alarm to indicate that the device as multicast receiver cannot receive files due to a problem with the Unified Name Space (UNS). It stops sending NACKs for the UNS failed files.

Recommended Action    Check and fix the UNS-related issues in the multicast receiver SE.

Alarm 213504 (svcnacksuppressed) Alarm that Multicast Receiver has stopped NACKs due to heavy loss.

Explanation    The unicast data receiver issues this alarm to indicate that the device as multicast receiver cannot receive multicast files for some considerable time and has stopped sending NACKs for the files.

Recommended Action    Check the multicast network for any problems. The sending of NACKs starts after at least one file is successfully received.

Alarm 215003 (svcdevfailover) Alarm backup multicast sender has been activated.

Explanation    The backup multicast sender issues this alarm to indicate that it has been activated and either the primary backup sender has a problem, or the primary and backup multicast senders cannot communicate with each other due to possible network connection issues.

Recommended Action    Troubleshoot the multicast sender service on the primary multicast sender and check the network connectivity between the primary and backup multicast senders.

Alarm 215500 (svcbwclosed) Alarm Mout bandwidth is set to zero while jobs are scheduled.

Explanation    The multicast data sender issues this alarm to indicate that the device has Mout scheduled to be zero, but currently the multicast data sender has a job scheduled or is running.

Recommended Action    Access the CDSM and determine if the bandwidth values and bandwidth schedules are correctly configured, and verify on the device the effective bandwidth and job statistics.

Alarm 330004 (servicedead) - service name - service died.

Explanation    The node manager found the specified service to be dead. Attempts are made to restart this service.

Recommended Action    Examine the syslog for messages relating to the cause of service death. The alarm is cleared if the service stays alive and does not restart in a short while.

Alarm 335005 (test3) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 335007 (test5) NHM Alarm Testing [string].

Explanation    This alarm is used for testing the Node Health Manager.

Recommended Action    None. This alarm should never occur during normal operation.

Alarm 400000 (wesvcthresholdexceeded) WebEngine has reached service threshold limits.


Note In Release 2.5.7 and later releases, this alarm is not supported for the Web Engine.


Explanation    The Web Engine service has reached license limits, or the limits were configured with the webengine max-concurrent-sessions command.

Recommended Action    Avoid further service requests to this device.

Alarm 445000 (disk_failure) An SE disk has failed.

Explanation    The SYSMON issues this alarm to indicate that one of the disks attached to the SE is not responding.

Recommended Action    Access the device and execute the show disk details command. If the problem persists, replace the disk.

Alarm 445004 (disk smartfailminor) A SE disk has minor early-prediction failure.


Note This alarm is only on the Cisco Internet Streamer CDS 2.5.9 Release software and later.


Explanation    The SYSMON issues this alarm to indicate that one of the disks attached to the SE has a minor early-prediction failure. It warns that the disk may fail soon.

Recommended Action    Monitor the disk for early indications of a errors occurring. If more severe SMART errors occur, or if disk errors occur, take the appropriate action.

Alarm 445006 (SoftRAID_Event) A SoftRAID device has become degraded and requires immediate action.

Explanation    The SYSMON issues this alarm to indicate that a SoftRAID device has become degraded (for example, one disk of a RAID-1 array has become inaccessible or faulty).

Recommended Action    If the system suspects an inconsistency in the RAID volume, it will initiate a resync to restore the volume's integrity. Check the RAID status using the show disk raid command to verify whether a disk failure or resync is occurring. For a resync, wait for the sync(s) to complete. For a degraded array, replace the disk.

Alarm 445007(psu_down) A power supply power cable is unplugged or the power supply has failed.

Explanation    The System Monitor issues this alarm to indicate that at least one power supply failed or is unplugged.

Recommended Action    Check the back of the CDE and locate the power supplies. Verify the power cables are plugged in and replace any failed power supplies.

Alarm 445009 (system_hitemp) System Temperature Warning!

Explanation    The System Monitor issues this alarm to indicate that the motherboard sensor reports high temperatures.

Recommended Action    Check the temperature of the lab and the airflow inside the CDE.

Alarm 445011 (badsector). Bad sector on disk.


Note This alarm is only on the Cisco Internet Streamer CDS 2.5.9 Release software and later.


Explanation    The system came across a corrupted disk sector that it may or may not have been able to identify in the description above.

Recommended Action    Contact Cisco TAC. The sector might be recoverable through a process that requires reformatting the disk.

445015 (filesystem_failure) A filesystem error has occurred.


Note This alarm is only on the Cisco Internet Streamer CDS 2.5.9 Release software and later.


Explanation    The System Monitor issues this to indicate that an unexpected filesystem error occurred.

Recommended Action    Access the device, collect the syslog, and contact Cisco TAC.

Alarm 511010 (svcthresholdexceeded) WMT has reached service threshold limits.

Explanation    Windows media technologies service has reached license limits, or the limits are configured with the wmt max-concurrent-sessions bandwidth wmt outgoing command.

Recommended Action    Avoid further service requests to this device.

Alarm 511011 (fmsthresholdexceeded) FMS has reached service threshold limits.

Explanation    FMS service has reached concurrent connection limits.

Recommended Action    Avoid further service requests to this device, or contact Cisco Technical Assistance Center for more connection licenses.

Alarm 511012 (mssvcthresholdexceeded) Movie Streamer has reached service threshold limits.

Explanation    MS service has reach license limits, or the limits are configured.

Recommended Action    Avoid further service requests to the device.

Alarm 511013 (encoderfailure) Encoder:/program_IP_Address/encoder_name/encoder_url :failed.

Explanation    Encoder is stopped or down or some issues happened.

Recommended Action    Check the encoder to make sure it work correctly or try to use other corrected encoders. Run the command "show alarms detail support" can get above information.

Alarm 520001 (LinkDown) -group-ifc-slot-port- Specified interface in the standby group is down.

Explanation    The specified interface in the standby group is down. There could have been a link failure on the interface or it may have been shut down on purpose.

Recommended Action    Check the configuration and cabling of the specified interface.

Alarm 520002 (RouteDown) -group-ifc-slot-port- Unable to reach the configured default gateway on the specified interface.

Explanation    Unable to reach the configured default gateway on the specified interface in the standby group.

Recommended Action    Check the network configuration on the specified interface.

Alarm 520003 (MaxError) -group-ifc-slot-port- The specified interface has seen errors exceeding maximum allowable error count.

Explanation    The specified interface has seen errors exceeding the maximum allowable error count.

Recommended Action    Check the cabling or configuration of the specified interface.

Alarm 540001 (shutdown) Network interface is shutdown.

Explanation    The network interface is shut down.

Recommended Action    Check the interface configuration.

Alarm 700001 (cms_test_alarm) CMS test alarm with instance value - instance was raised. The title is used in the CDSM GUI.

Explanation    This is a test alarm defined and used in CMS code. This alarm is identified by a tuple (340001, instance). This means the system may have several raised alarms with the 340001 ID having different instance values. Instance is usually used to link an alarm to a particular data item (such as a particular failed disk, or a delivery service having A&D troubles).

Recommended Action    Advise the user how to handle this raised alarm. This is shown in the CDSM GUI or command-line interface (CLI).

Alarm 700003 (rea_alarm) REA agent alarm was raised.

Explanation    The Remote Execution Agent (REA) raised an alarm. This may lead to services such as URL deletion not functioning.

Recommended Action    Restart the REA from the CLI.

Alarm 900001 (memory_exceed) Web Engine memory exceeds the threshold value.

Explanation    The Web Engine reached the memory threshold.

Recommended Action    Avoid further Web Engine service requests to this SE.

Alarm 900002 (max_session_exceed) Web Engine concurrent sessions exceeds the threshold value.

Explanation    The Web Engine reached the session threshold (29000 concurrent sessions).

Recommended Action    Avoid further Web Engine service requests to this SE.

Alarm 1000010 (ManifestEmptyContent) Parsed Manifest file does not have any items to process.

Explanation    There are no single or crawl items mentioned in the manifest file to process.

Recommended Action    Edit the manifest file of this delivery service to have one or more items to process.

SNMP Alarm Traps

Cisco Internet Streamer Release 2.5 software supports six generic alarms traps. Table 2-1 presents the trap number and trap type of the six generic alarm traps. Alarm traps sent from a CDS device contain a numeric alarm identifier, a trap number, a module identifier, and a category identifier. To enable the CDS device to send SNMP alarm traps for a specific alarm condition, use the snmp-server enable traps command. You can configure the generation of alarm traps based on the severity of the alarm and on whether the alarm is raised or cleared.

Table 2-1 Generic Alarm Traps

Trap Number
Trap Type

7

Critical alarm raised

8

Critical alarm cleared

9

Major alarm raised

10

Major alarm cleared

11

Minor alarm raised

12

Minor alarm cleared


Table 2-2 below presents the mapping of module names to module identifiers .

Table 2-2 Mapping of Module Names to Module Identifiers

Module Name
Module Identifier

Acquirer

4000

Active Directory Database

8000

Content Management Service

3000

Flash Media Streaming

4500

Movie Streamer

4750

Multicast data sender

7000

Node Health Manager

1

Node Health Manager 2

500

Network Interface Card

5500

Node Manager

2000

Remote Execution Agent

3500

Service Router

5600

Standby

4000

Service Monitor

5700

System Monitor

1000

Unicast data receiver

5000

Unicast data sender

6000

Web Engine

2500

Windows Media Technologies

9000


Table 2-3 below presents the mapping of category names to category identifiers.

Table 2-3 Mapping of Category Names to Category IDs

Category Name
Category Identifier

Communications

1

Service Quality

2

Processing Error

3

Equipment

4

Environment

5

Content

6