Cisco H.323 Signaling Interface User Guide, Release 4.3
Troubleshooting Cisco HSI Alarms
Downloads: This chapterpdf (PDF - 492.0KB) The complete bookPDF (PDF - 12.06MB) | Feedback

Troubleshooting Cisco HSI Alarms

Table Of Contents

Troubleshooting Cisco HSI Alarms

Introduction

Alarms Overview

Debounce

Alarm Severity Levels

Retrieving and Reporting Alarms

Informational Event Requirements

SNMP Trap Types

Retrieving Alarm Messages

Acknowledging and Clearing Alarms

Alarms List

Troubleshooting

H323_STACK_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

CONFIGURATION_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

EISUP_PATH_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

GATEKEEPER_INTERFACE_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

GENERAL_PROCESS_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

IP_LINK_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

LOW_DISK_SPACE

Description

Severity Level and Trap Type

Cause

Troubleshooting

OVERLOAD_LEVEL3

Description

Severity Level and Trap Type

Cause

Troubleshooting

VSC_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

HSI_NEEDS_UNLOCKING

Description

Severity Level and Trap Type

Cause

Troubleshooting

OVERLOAD_LEVEL2

Description

Severity Level and Trap Type

Cause

Troubleshooting

CONFIG_CHANGE

Description

Severity Level and Trap Type

Cause

Troubleshooting

ENDPOINT_CALL_CONTROL_INTERFACE_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

ENDPOINT_CHANNEL_INTERFACE_FAILURE

Description

Severity Level and Trap Type

Cause

Troubleshooting

GAPPED_CALL_NORMAL

Description

Severity Level and Trap Type

Cause

Troubleshooting

GAPPED_CALL_PRIORITY

Description

Severity Level and Trap Type

Cause

Troubleshooting

OVERLOAD_LEVEL1

Description

Severity Level and Trap Type

Cause

Troubleshooting

PROVISIONING_INACTIVITY_TIMEOUT

Description

Severity Level and Trap Type

Cause

Troubleshooting

PROVISIONING_SESSION_TIMEOUT

Description

Severity Level and Trap Type

Cause

Troubleshooting

STOP_CALL_PROCESSING

Description

Severity Level and Trap Type

Cause

Troubleshooting


Troubleshooting Cisco HSI Alarms


Revised: April, 2010, OL-11616-08

Introduction

This chapter contains information about Cisco H.323 Signaling Interface (HSI) alarms, troubleshooting procedures for these alarms, and information about detailed logging. This chapter contains the following sections:

Alarms Overview

Retrieving Alarm Messages

Acknowledging and Clearing Alarms

Alarms List

Troubleshooting

Alarms Overview

An alarm can be in one of the following states:

Raised, when a persistent fault occurs in the system

Cleared, when the fault is fixed

Debounce

The alarms have a timeout (debounce) period. The debounce period is the time that elapses before an alarm condition is accepted. Use the ALARMDEBOUNCETIME parameter to set the debounce period (see Chapter 3, "Provisioning the Cisco HSI"). The default debounce period is 0.

Alarm Severity Levels

The Cisco HSI generates autonomous messages, or events, to notify you of problems or atypical network conditions. Depending on the event severity level, events are considered alarms or informational events. Table 5-1 lists the severity levels and the required responses.

Table 5-1 Alarm Severity Levels

Severity Level
Description

Critical

A serious problem exists in the network. Clear critical alarms immediately. A critical alarm should force an automatic restart of the application.

Major

A disruption of service has occurred. Clear this alarm immediately.

Minor

No disruption of service has occurred, but clear this alarm as soon as possible.

Informational

An abnormal condition has occurred. It is transient and does not require corrective action. (An invalid protocol call state transition is an example of an event that prompts such an alarm.) No corrective action is required by the management center to fix the problem.


Retrieving and Reporting Alarms

Events with a severity level of critical, major, or minor are classified as alarms and can be retrieved through the Man-Machine Language (MML) interface and a Simple Network Management Protocol (SNMP) manager.

An alarm must be reported when an alarm state changes (assuming the alarm does not have an unreported severity).

Informational Event Requirements

Informational events do not require state changes. An informational event is a warning that an abnormal condition that does not require corrective action has occurred. An invalid protocol call state transition is an example of an informational event. The informational event needs to be reported, but it is transient. No corrective action is required by the management center to fix the problem.

An informational event is reported once, upon occurrence, through the MML and SNMP interfaces. The MML interface must be in the rtrv-alms:cont mode for the event to be displayed. The event is not displayed in subsequent rtrv-alms commands.

SNMP Trap Types

Alarms have SNMP trap types associated with them. Table 5-2 identifies the trap types.

Table 5-2 SNMP Trap Types 

Trap Type
Trap Description

0

No error

1

Communication alarm

2

Quality of service

3

Processing error

4

Equipment error

5

Environment error


Retrieving Alarm Messages

Alarms can be displayed in noncontinuous mode or in continuous mode.

To display all current alarms, use the rtrv-alms MML command.

Figure 5-1 shows an example of an alarm message displayed with the rtrv-alms MML command (noncontinuous mode). For more information about the rtrv-alms MML command, see Appendix A, "MML User Interface and Command Reference."

Figure 5-1

Sample Alarm Message

The example in Figure 5-1 shows a Cisco Public Switched Telephone Network (PSTN) Gateway (PGW 2200) communication failure on the Cisco HSI that has the node ID H323-GW1. The resulting message is an alarm with a major severity level.

Acknowledging and Clearing Alarms

To acknowledge that an alarm is recognized but not cleared, use the ack-alm MML command. See Appendix A, "MML User Interface and Command Reference," for more information.

To clear an alarm, use the clr-alm MML command. See Appendix A, "MML User Interface and Command Reference," for more information.

Alarms List

Table 5-3 lists alarms and information events. Troubleshooting information for each of the alarms and information events can be found in the "Troubleshooting" section.

Table 5-3 Alarms and Information Events 

Alarm Event and Reference
Severity Level

H323_STACK_FAILURE

Critical

CONFIGURATION_FAILURE

Major

EISUP_PATH_FAILURE

Major

GATEKEEPER_INTERFACE_FAILURE

Major

GENERAL_PROCESS_FAILURE

Major

IP_LINK_FAILURE

Major

LOW_DISK_SPACE

Major

OVERLOAD_LEVEL3

Major

VSC_FAILURE

Major

HSI_NEEDS_UNLOCKING

Major

OVERLOAD_LEVEL2

Minor

CONFIG_CHANGE

Information

ENDPOINT_CALL_CONTROL_INTERFACE_FAILURE

Information

ENDPOINT_CHANNEL_INTERFACE_FAILURE

Information

GAPPED_CALL_NORMAL

Information

GAPPED_CALL_PRIORITY

Information

OVERLOAD_LEVEL1

Information

PROVISIONING_INACTIVITY_TIMEOUT

Information

PROVISIONING_SESSION_TIMEOUT

Information

STOP_CALL_PROCESSING

Information


Troubleshooting

This section provides troubleshooting procedures for the alarms and information events listed in Table 5-3.

H323_STACK_FAILURE

Description

Irrecoverable failure in the RADVision stack. This alarm is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is critical. The trap type is 4.

Cause

The H.323 RADVision stack has failed to correctly initialize on an application startup. An automatic application restart is initiated, and the application reverts to the base configuration data.

Troubleshooting

To clear the H.323 stack failure alarm, complete the following steps:


Step 1 Allow the application to restart and revert to the base configuration data that is known to be reliable.

Step 2 Review the H323_SYS parameters in a provisioning session, ensuring that the values are correct and within the memory limits of the machine.

Step 3 Use the prov-cpy MML command to recommit the new H323_SYS parameters.

Step 4 Use the restart-softw MML command to initiate a software restart.

Step 5 Use the rtrv-alms MML command to check the alarm list to see if the H.323 stack correctly initializes.


CONFIGURATION_FAILURE

Description

The configuration has failed. This alarm is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is major. The trap type is 4.

Cause

A major error has occurred in the configuration of the software packages. This is a potentially nonrecoverable situation that requires an application restart.

Troubleshooting

To clear the CONFIGURATION_FAILURE alarm, complete the following steps:


Step 1 Use the restart-softw:init command to restart the application and revert to the base configuration.

Step 2 Review the modified parameters and ensure that the values are correct.

Step 3 Use the prov-cpy MML command to recommit the new parameters.

Step 4 Use the restart-softw MML command to initiate a software restart.

Step 5 Use the rtrv-alms MML command to check the alarm list to see if the problem has been resolved.


EISUP_PATH_FAILURE

Description

A failure of the RUDP layer has occurred. This alarm is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is major. The trap type is 4.

Cause

Both IP links A and B to a single Cisco PGW 2200 have gone down.

Troubleshooting

To clear the EISUP_Path_Failure alarm, complete the following steps:


Step 1 Use the rtrv-dest command to identify which Cisco PGW 2200 (standby or active) has been lost.

Step 2 Check the network connections, cables, and routers for that system.

Step 3 Use the clr-alm MML command to attempt to clear the alarm.


GATEKEEPER_INTERFACE_FAILURE

Description

Failed to register to gatekeeper. This alarm is reported to the management interface. You can obtain this alarm by using SNMP.

Severity Level and Trap Type

The severity level is major. The trap type is 6.

Cause

The Cisco HSI failed to register several times (as configured in RAS, maxFail).

The Gatekeeper did not return a response to register requests.

Troubleshooting

In response to the GATEKEEPER_INTERFACE_FAILURE alarm, complete the following steps:


Step 1 Ensure that the correct Gatekeeper IP address and port are provisioned on the Cisco HSI.

Step 2 Ensure that the network route to the Gatekeeper is correct.

Step 3 Ensure that the Gatekeeper is working.


GENERAL_PROCESS_FAILURE

Description

A general process failure has occurred. This alarm is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is major. The trap type is 4.

Cause

The Cisco HSI (GWmain program) quit unexpectedly (that is, there were no requests to stop or restart the application). The process manager (PMmain) raises the GENERAL_PROCESS_FAILURE alarm so that a trap is sent to the Cisco Media Gateway Controller Node Manager.

The process manager clears the GENERAL_PROCESS_FAILURE alarm when it restarts the Cisco HSI (GWmain).

Troubleshooting

To trace the problem, look at either the core file or the log files.

IP_LINK_FAILURE

Description

A failure of the IP link has occurred. This alarm is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is major. The trap type is 4.

Cause

One of the two links to a single Cisco PGW 2200 has failed.

Troubleshooting

To clear the IP link failure alarm, complete the following steps:


Step 1 Use the rtrv-dest command to identify which PGW 2200 (standby or active) has been lost.

Step 2 Check the network connections, cables, and routers for that system.

Step 3 Use the clr-alm MML command to attempt to clear the alarm.


LOW_DISK_SPACE

Description

The disk space is low. This alarm is reported to the management interface and can be obtained with SNMP. The alarm automatically clears when the disk usage falls below the alarm limit.

Severity Level and Trap Type

The severity level is major. The trap type is 4.

Cause

The percentage of disk usage is greater than the alarm limit.

Troubleshooting

To obtain more disk space, remove old versions of installed software that are no longer required, or archive log files from the $GWHOME/var/log directory, for example.

OVERLOAD_LEVEL3

Description

An overload level 3 condition exists. This alarm is reported to the management interface and can be obtained with SNMP. This alarm automatically clears when the CPU occupancy or the number of active calls drops below the lower limits set in the overload configuration for level 3.

Severity Level and Trap Type

The severity level is major. The trap type is 4.

Cause

The OVERLOAD_LEVEL3 alarm is triggered when the CPU occupancy or the number of active calls rises above the upper limits set in the overload configuration for level 3. Gapping is then initiated.

Troubleshooting

To clear the OVERLOAD_LEVEL3 alarm, complete the following steps:


Step 1 Wait for the number of calls to drop.

Step 2 If CPU occupancy remains high, request assistance from the system administrator.


VSC_FAILURE

Description

This alarm is derived by the Cisco HSI application from RUDP/SM events. This alarm is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is major. The trap type is 5.

Cause

Links to both (active and standby) Cisco PGW 2200s have gone down.

Troubleshooting

To clear the VSC_FAILURE alarm, complete the following steps:


Step 1 Use the rtrv-dest command to confirm that links to the Cisco PGW 2200s have gone down.

Step 2 Check the network connections, cables, and routers.

Step 3 Refer to the Cisco Media Gateway Controller Software Release 9 Operations, Maintenance, and Troubleshooting Guide for detailed information about this alarm.

Step 4 Use the clr-alm command to attempt to clear the alarm.


HSI_NEEDS_UNLOCKING

Description

HSI is locked. This alarm is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is Major. The trap type is 4.

Cause

The HSI license invalid or expired.

Troubleshooting

To clear the HSI_NEEDS_UNLOCKING alarm, complete the following steps:


Step 1 Issue the rtrv-lics command to confirm that the HSI license is invalid or expired.

Step 2 Confirm that the license files under /opt/GoldWing/license/ are invalid or expired.

Step 3 Contact the Cisco Technical Assistance Center (TAC) engineers to get valid license files.

Step 4 Put valid license files in the directory /opt/GoldWing/license/.

Step 5 Restart the HSI and issue the rtrv-lics command to confirm that the license is valid.


OVERLOAD_LEVEL2

Description

An overload level 2 condition exists. This alarm is reported to the management interface and can be obtained with SNMP. This alarm automatically clears when the CPU occupancy or the number of active calls drops below the lower limits set in the overload configuration for level 2.

Severity Level and Trap Type

The severity level is minor. The trap type is 4.

Cause

The OVERLOAD_LEVEL2 alarm is triggered when the CPU occupancy or the number of active calls rises above the upper limits set in the overload configuration for level 2. Gapping is then initiated.

Troubleshooting

In response to the OVERLOAD_LEVEL2 alarm, complete the following steps:


Step 1 Wait for the number of calls to drop.

Step 2 If CPU occupancy remains high, request assistance from the system administrator.


CONFIG_CHANGE

Description

The running configuration has been modified.

Severity Level and Trap Type

The severity level is information. The trap type is 0.

Cause

A new configuration has been activated within a provisioning session.

Troubleshooting

This is an informational event.

ENDPOINT_CALL_CONTROL_INTERFACE_FAILURE

Description

An individual call failure has occurred. This informational event is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is information. The trap type is 3.

Cause

The RADVision stack reports this alarm.

Troubleshooting

This is an informational event.

ENDPOINT_CHANNEL_INTERFACE_FAILURE

Description

An individual call failure has occurred. This informational event is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is information. The trap type is 3.

Cause

The RADVision stack reports this alarm.

Troubleshooting

This is an informational event.

GAPPED_CALL_NORMAL

Description

A normal call has been rejected due to call gapping. This informational event is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is information. The trap type is 2.

Cause

The GAPPED_CALL_NORMAL alarm is triggered when gapping levels cause a normal call to be rejected.

Troubleshooting

To clear the GAPPED_CALL_NORMAL informational event, complete the following steps:


Step 1 Use the rtrv-gapping MML command to retrieve gapping information.

Step 2 If the MML-specific gap levels are active, use the set-gapping MML command to modify them.

Step 3 If the overload-specific gap levels are active, either modify the provisioned overload gapping percent levels or reduce the cause of the overload (see OVERLOAD_LEVEL1, OVERLOAD_LEVEL2, and OVERLOAD_LEVEL3).


GAPPED_CALL_PRIORITY

Description

A priority or emergency call has been rejected due to call gapping. This informational event is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is information. The trap type is 2.

Cause

The GAPPED_CALL_NORMAL alarm is triggered when gapping levels cause a priority or emergency call to be rejected.

Troubleshooting

To clear the GAPPED_CALL_PRIORITY informational event, complete the following steps:


Step 1 Change the MML gapping levels to less than 100 percent and change the call type to normal.

Step 2 Change the provisioned overload call filter type to normal.


OVERLOAD_LEVEL1

Description

An overload level 1 condition exists. This informational event is reported to the management interface and can be obtained with SNMP.

Severity Level and Trap Type

The severity level is information. The trap type is 4.

Cause

The OVERLOAD_LEVEL1 alarm is triggered when the CPU occupancy or the number of active calls rises above the upper limits set in the overload configuration for level 1. Gapping is then initiated.

Troubleshooting

In response to the OVERLOAD_LEVEL1 informational event, complete the following steps:


Step 1 Wait for the number of calls to drop.

Step 2 If CPU occupancy remains high, request assistance from the system administrator.


PROVISIONING_INACTIVITY_TIMEOUT

Description

A provisioning session has been inactive for 20 minutes. The text of the output is:

"H323-GW1:2001-01-30 11:12:57.421,A^ ALM=\"PROVISIONING INACTIVITY TIMEOUT\",SEV=IF"

Severity Level and Trap Type

The severity level is information. The trap type is 3.

Cause

The provisioning session has been inactive for 20 minutes. The provisioning session will be closed if there is no activity within the next 5 minutes.

Troubleshooting

Ensure that activity in the provisioning session occurs at least every 20 minutes.

PROVISIONING_SESSION_TIMEOUT

Description

The current session has been terminated. The text of the output is:

"H323-GW1:2001-01-30 11:17:57.422,A^ ALM=\"PROVISIONING SESSION 
TIMEOUT\",SEV=IF"

Severity Level and Trap Type

The severity level is information. The trap type is 3.

Cause

The provisioning session has been inactive for longer than the time allowed.

Troubleshooting

Ensure that activity within the provisioning session occurs at least every 20 minutes.

STOP_CALL_PROCESSING

Description

A stop call processing request has been entered through the MML.

Severity Level and Trap Type

The severity level is information. The trap type is 4.

Cause

A user has entered the stp-callproc command through the MML.

Troubleshooting

This is an informational event.