Table Of Contents
Troubleshooting Cisco HSI Alarms
Introduction
Alarms Overview
Debounce
Alarm Severity Levels
Retrieving and Reporting Alarms
Informational Event Requirements
SNMP Trap Types
Retrieving Alarm Messages
Acknowledging and Clearing Alarms
Alarms List
Troubleshooting
H323_STACK_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
CONFIGURATION_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
EISUP_PATH_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
GATEKEEPER_INTERFACE_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
GENERAL_PROCESS_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
IP_LINK_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
LOW_DISK_SPACE
Description
Severity Level and Trap Type
Cause
Troubleshooting
OVERLOAD_LEVEL3
Description
Severity Level and Trap Type
Cause
Troubleshooting
VSC_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
HSI_NEEDS_UNLOCKING
Description
Severity Level and Trap Type
Cause
Troubleshooting
OVERLOAD_LEVEL2
Description
Severity Level and Trap Type
Cause
Troubleshooting
CONFIG_CHANGE
Description
Severity Level and Trap Type
Cause
Troubleshooting
ENDPOINT_CALL_CONTROL_INTERFACE_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
ENDPOINT_CHANNEL_INTERFACE_FAILURE
Description
Severity Level and Trap Type
Cause
Troubleshooting
GAPPED_CALL_NORMAL
Description
Severity Level and Trap Type
Cause
Troubleshooting
GAPPED_CALL_PRIORITY
Description
Severity Level and Trap Type
Cause
Troubleshooting
OVERLOAD_LEVEL1
Description
Severity Level and Trap Type
Cause
Troubleshooting
PROVISIONING_INACTIVITY_TIMEOUT
Description
Severity Level and Trap Type
Cause
Troubleshooting
PROVISIONING_SESSION_TIMEOUT
Description
Severity Level and Trap Type
Cause
Troubleshooting
STOP_CALL_PROCESSING
Description
Severity Level and Trap Type
Cause
Troubleshooting
Troubleshooting Cisco HSI Alarms
Revised: April, 2010, OL-11616-08
Introduction
This chapter contains information about Cisco H.323 Signaling Interface (HSI) alarms, troubleshooting procedures for these alarms, and information about detailed logging. This chapter contains the following sections:
•
Alarms Overview
•
Retrieving Alarm Messages
•
Acknowledging and Clearing Alarms
•
Alarms List
•
Troubleshooting
Alarms Overview
An alarm can be in one of the following states:
•
Raised, when a persistent fault occurs in the system
•
Cleared, when the fault is fixed
Debounce
The alarms have a timeout (debounce) period. The debounce period is the time that elapses before an alarm condition is accepted. Use the ALARMDEBOUNCETIME parameter to set the debounce period (see Chapter 3, "Provisioning the Cisco HSI"). The default debounce period is 0.
Alarm Severity Levels
The Cisco HSI generates autonomous messages, or events, to notify you of problems or atypical network conditions. Depending on the event severity level, events are considered alarms or informational events. Table 5-1 lists the severity levels and the required responses.
Table 5-1 Alarm Severity Levels
Severity Level
|
Description
|
Critical
|
A serious problem exists in the network. Clear critical alarms immediately. A critical alarm should force an automatic restart of the application.
|
Major
|
A disruption of service has occurred. Clear this alarm immediately.
|
Minor
|
No disruption of service has occurred, but clear this alarm as soon as possible.
|
Informational
|
An abnormal condition has occurred. It is transient and does not require corrective action. (An invalid protocol call state transition is an example of an event that prompts such an alarm.) No corrective action is required by the management center to fix the problem.
|
Retrieving and Reporting Alarms
Events with a severity level of critical, major, or minor are classified as alarms and can be retrieved through the Man-Machine Language (MML) interface and a Simple Network Management Protocol (SNMP) manager.
An alarm must be reported when an alarm state changes (assuming the alarm does not have an unreported severity).
Informational Event Requirements
Informational events do not require state changes. An informational event is a warning that an abnormal condition that does not require corrective action has occurred. An invalid protocol call state transition is an example of an informational event. The informational event needs to be reported, but it is transient. No corrective action is required by the management center to fix the problem.
An informational event is reported once, upon occurrence, through the MML and SNMP interfaces. The MML interface must be in the rtrv-alms:cont mode for the event to be displayed. The event is not displayed in subsequent rtrv-alms commands.
SNMP Trap Types
Alarms have SNMP trap types associated with them. Table 5-2 identifies the trap types.
Table 5-2 SNMP Trap Types
Trap Type
|
Trap Description
|
0
|
No error
|
1
|
Communication alarm
|
2
|
Quality of service
|
3
|
Processing error
|
4
|
Equipment error
|
5
|
Environment error
|
Retrieving Alarm Messages
Alarms can be displayed in noncontinuous mode or in continuous mode.
To display all current alarms, use the rtrv-alms MML command.
Figure 5-1 shows an example of an alarm message displayed with the rtrv-alms MML command (noncontinuous mode). For more information about the rtrv-alms MML command, see Appendix A, "MML User Interface and Command Reference."
Figure 5-1
Sample Alarm Message
The example in Figure 5-1 shows a Cisco Public Switched Telephone Network (PSTN) Gateway (PGW 2200) communication failure on the Cisco HSI that has the node ID H323-GW1. The resulting message is an alarm with a major severity level.
Acknowledging and Clearing Alarms
To acknowledge that an alarm is recognized but not cleared, use the ack-alm MML command. See Appendix A, "MML User Interface and Command Reference," for more information.
To clear an alarm, use the clr-alm MML command. See Appendix A, "MML User Interface and Command Reference," for more information.
Alarms List
Table 5-3 lists alarms and information events. Troubleshooting information for each of the alarms and information events can be found in the "Troubleshooting" section.
Troubleshooting
This section provides troubleshooting procedures for the alarms and information events listed in Table 5-3.
H323_STACK_FAILURE
Description
Irrecoverable failure in the RADVision stack. This alarm is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is critical. The trap type is 4.
Cause
The H.323 RADVision stack has failed to correctly initialize on an application startup. An automatic application restart is initiated, and the application reverts to the base configuration data.
Troubleshooting
To clear the H.323 stack failure alarm, complete the following steps:
Step 1
Allow the application to restart and revert to the base configuration data that is known to be reliable.
Step 2
Review the H323_SYS parameters in a provisioning session, ensuring that the values are correct and within the memory limits of the machine.
Step 3
Use the prov-cpy MML command to recommit the new H323_SYS parameters.
Step 4
Use the restart-softw MML command to initiate a software restart.
Step 5
Use the rtrv-alms MML command to check the alarm list to see if the H.323 stack correctly initializes.
CONFIGURATION_FAILURE
Description
The configuration has failed. This alarm is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is major. The trap type is 4.
Cause
A major error has occurred in the configuration of the software packages. This is a potentially nonrecoverable situation that requires an application restart.
Troubleshooting
To clear the CONFIGURATION_FAILURE alarm, complete the following steps:
Step 1
Use the restart-softw:init command to restart the application and revert to the base configuration.
Step 2
Review the modified parameters and ensure that the values are correct.
Step 3
Use the prov-cpy MML command to recommit the new parameters.
Step 4
Use the restart-softw MML command to initiate a software restart.
Step 5
Use the rtrv-alms MML command to check the alarm list to see if the problem has been resolved.
EISUP_PATH_FAILURE
Description
A failure of the RUDP layer has occurred. This alarm is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is major. The trap type is 4.
Cause
Both IP links A and B to a single Cisco PGW 2200 have gone down.
Troubleshooting
To clear the EISUP_Path_Failure alarm, complete the following steps:
Step 1
Use the rtrv-dest command to identify which Cisco PGW 2200 (standby or active) has been lost.
Step 2
Check the network connections, cables, and routers for that system.
Step 3
Use the clr-alm MML command to attempt to clear the alarm.
GATEKEEPER_INTERFACE_FAILURE
Description
Failed to register to gatekeeper. This alarm is reported to the management interface. You can obtain this alarm by using SNMP.
Severity Level and Trap Type
The severity level is major. The trap type is 6.
Cause
The Cisco HSI failed to register several times (as configured in RAS, maxFail).
The Gatekeeper did not return a response to register requests.
Troubleshooting
In response to the GATEKEEPER_INTERFACE_FAILURE alarm, complete the following steps:
Step 1
Ensure that the correct Gatekeeper IP address and port are provisioned on the Cisco HSI.
Step 2
Ensure that the network route to the Gatekeeper is correct.
Step 3
Ensure that the Gatekeeper is working.
GENERAL_PROCESS_FAILURE
Description
A general process failure has occurred. This alarm is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is major. The trap type is 4.
Cause
The Cisco HSI (GWmain program) quit unexpectedly (that is, there were no requests to stop or restart the application). The process manager (PMmain) raises the GENERAL_PROCESS_FAILURE alarm so that a trap is sent to the Cisco Media Gateway Controller Node Manager.
The process manager clears the GENERAL_PROCESS_FAILURE alarm when it restarts the Cisco HSI (GWmain).
Troubleshooting
To trace the problem, look at either the core file or the log files.
IP_LINK_FAILURE
Description
A failure of the IP link has occurred. This alarm is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is major. The trap type is 4.
Cause
One of the two links to a single Cisco PGW 2200 has failed.
Troubleshooting
To clear the IP link failure alarm, complete the following steps:
Step 1
Use the rtrv-dest command to identify which PGW 2200 (standby or active) has been lost.
Step 2
Check the network connections, cables, and routers for that system.
Step 3
Use the clr-alm MML command to attempt to clear the alarm.
LOW_DISK_SPACE
Description
The disk space is low. This alarm is reported to the management interface and can be obtained with SNMP. The alarm automatically clears when the disk usage falls below the alarm limit.
Severity Level and Trap Type
The severity level is major. The trap type is 4.
Cause
The percentage of disk usage is greater than the alarm limit.
Troubleshooting
To obtain more disk space, remove old versions of installed software that are no longer required, or archive log files from the $GWHOME/var/log directory, for example.
OVERLOAD_LEVEL3
Description
An overload level 3 condition exists. This alarm is reported to the management interface and can be obtained with SNMP. This alarm automatically clears when the CPU occupancy or the number of active calls drops below the lower limits set in the overload configuration for level 3.
Severity Level and Trap Type
The severity level is major. The trap type is 4.
Cause
The OVERLOAD_LEVEL3 alarm is triggered when the CPU occupancy or the number of active calls rises above the upper limits set in the overload configuration for level 3. Gapping is then initiated.
Troubleshooting
To clear the OVERLOAD_LEVEL3 alarm, complete the following steps:
Step 1
Wait for the number of calls to drop.
Step 2
If CPU occupancy remains high, request assistance from the system administrator.
VSC_FAILURE
Description
This alarm is derived by the Cisco HSI application from RUDP/SM events. This alarm is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is major. The trap type is 5.
Cause
Links to both (active and standby) Cisco PGW 2200s have gone down.
Troubleshooting
To clear the VSC_FAILURE alarm, complete the following steps:
Step 1
Use the rtrv-dest command to confirm that links to the Cisco PGW 2200s have gone down.
Step 2
Check the network connections, cables, and routers.
Step 3
Refer to the Cisco Media Gateway Controller Software Release 9 Operations, Maintenance, and Troubleshooting Guide for detailed information about this alarm.
Step 4
Use the clr-alm command to attempt to clear the alarm.
HSI_NEEDS_UNLOCKING
Description
HSI is locked. This alarm is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is Major. The trap type is 4.
Cause
The HSI license invalid or expired.
Troubleshooting
To clear the HSI_NEEDS_UNLOCKING alarm, complete the following steps:
Step 1
Issue the rtrv-lics command to confirm that the HSI license is invalid or expired.
Step 2
Confirm that the license files under /opt/GoldWing/license/ are invalid or expired.
Step 3
Contact the Cisco Technical Assistance Center (TAC) engineers to get valid license files.
Step 4
Put valid license files in the directory /opt/GoldWing/license/.
Step 5
Restart the HSI and issue the rtrv-lics command to confirm that the license is valid.
OVERLOAD_LEVEL2
Description
An overload level 2 condition exists. This alarm is reported to the management interface and can be obtained with SNMP. This alarm automatically clears when the CPU occupancy or the number of active calls drops below the lower limits set in the overload configuration for level 2.
Severity Level and Trap Type
The severity level is minor. The trap type is 4.
Cause
The OVERLOAD_LEVEL2 alarm is triggered when the CPU occupancy or the number of active calls rises above the upper limits set in the overload configuration for level 2. Gapping is then initiated.
Troubleshooting
In response to the OVERLOAD_LEVEL2 alarm, complete the following steps:
Step 1
Wait for the number of calls to drop.
Step 2
If CPU occupancy remains high, request assistance from the system administrator.
CONFIG_CHANGE
Description
The running configuration has been modified.
Severity Level and Trap Type
The severity level is information. The trap type is 0.
Cause
A new configuration has been activated within a provisioning session.
Troubleshooting
This is an informational event.
ENDPOINT_CALL_CONTROL_INTERFACE_FAILURE
Description
An individual call failure has occurred. This informational event is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is information. The trap type is 3.
Cause
The RADVision stack reports this alarm.
Troubleshooting
This is an informational event.
ENDPOINT_CHANNEL_INTERFACE_FAILURE
Description
An individual call failure has occurred. This informational event is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is information. The trap type is 3.
Cause
The RADVision stack reports this alarm.
Troubleshooting
This is an informational event.
GAPPED_CALL_NORMAL
Description
A normal call has been rejected due to call gapping. This informational event is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is information. The trap type is 2.
Cause
The GAPPED_CALL_NORMAL alarm is triggered when gapping levels cause a normal call to be rejected.
Troubleshooting
To clear the GAPPED_CALL_NORMAL informational event, complete the following steps:
Step 1
Use the rtrv-gapping MML command to retrieve gapping information.
Step 2
If the MML-specific gap levels are active, use the set-gapping MML command to modify them.
Step 3
If the overload-specific gap levels are active, either modify the provisioned overload gapping percent levels or reduce the cause of the overload (see OVERLOAD_LEVEL1, OVERLOAD_LEVEL2, and OVERLOAD_LEVEL3).
GAPPED_CALL_PRIORITY
Description
A priority or emergency call has been rejected due to call gapping. This informational event is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is information. The trap type is 2.
Cause
The GAPPED_CALL_NORMAL alarm is triggered when gapping levels cause a priority or emergency call to be rejected.
Troubleshooting
To clear the GAPPED_CALL_PRIORITY informational event, complete the following steps:
Step 1
Change the MML gapping levels to less than 100 percent and change the call type to normal.
Step 2
Change the provisioned overload call filter type to normal.
OVERLOAD_LEVEL1
Description
An overload level 1 condition exists. This informational event is reported to the management interface and can be obtained with SNMP.
Severity Level and Trap Type
The severity level is information. The trap type is 4.
Cause
The OVERLOAD_LEVEL1 alarm is triggered when the CPU occupancy or the number of active calls rises above the upper limits set in the overload configuration for level 1. Gapping is then initiated.
Troubleshooting
In response to the OVERLOAD_LEVEL1 informational event, complete the following steps:
Step 1
Wait for the number of calls to drop.
Step 2
If CPU occupancy remains high, request assistance from the system administrator.
PROVISIONING_INACTIVITY_TIMEOUT
Description
A provisioning session has been inactive for 20 minutes. The text of the output is:
"H323-GW1:2001-01-30 11:12:57.421,A^ ALM=\"PROVISIONING INACTIVITY TIMEOUT\",SEV=IF"
Severity Level and Trap Type
The severity level is information. The trap type is 3.
Cause
The provisioning session has been inactive for 20 minutes. The provisioning session will be closed if there is no activity within the next 5 minutes.
Troubleshooting
Ensure that activity in the provisioning session occurs at least every 20 minutes.
PROVISIONING_SESSION_TIMEOUT
Description
The current session has been terminated. The text of the output is:
"H323-GW1:2001-01-30 11:17:57.422,A^ ALM=\"PROVISIONING SESSION
TIMEOUT\",SEV=IF"
Severity Level and Trap Type
The severity level is information. The trap type is 3.
Cause
The provisioning session has been inactive for longer than the time allowed.
Troubleshooting
Ensure that activity within the provisioning session occurs at least every 20 minutes.
STOP_CALL_PROCESSING
Description
A stop call processing request has been entered through the MML.
Severity Level and Trap Type
The severity level is information. The trap type is 4.
Cause
A user has entered the stp-callproc command through the MML.
Troubleshooting
This is an informational event.