Voice Health Monitor User Guide, 1.1
Voice Faults and Exceptions

Table Of Contents

Voice Faults and Exceptions

Overview of Faults and Exceptions

Device Types that Generate Faults

Contents of the Fault Tables

Media Server Faults

ICS 7750 Faults

Voice Gateway and Phone Access Switch Faults

Voice Mail Gateway Faults

Monitored Phone Faults

Gatekeeper Fault

Suspect Phone Fault

Voice Cluster Faults

General VHM Faults

Pass-Through Traps


Voice Faults and Exceptions


A fault is an abnormal condition that occurs when a system or a system component violates a performance threshold or is not functioning properly. An exception is a group of related faults.

VHM groups related faults into a single exception. That is, it generates a single exception of a given type per device, regardless of the number of faults that exist.

The following topics are discussed:

Overview of Faults and Exceptions

Media Server Faults

ICS 7750 Faults

Voice Gateway and Phone Access Switch Faults

Voice Mail Gateway Faults

Monitored Phone Faults

Gatekeeper Fault

Suspect Phone Fault

Voice Cluster Faults

General VHM Faults

Pass-Through Traps

Overview of Faults and Exceptions

By polling SNMP MIBs and subscribing to voice-related events received by DFM, VHM obtains event information to analyze, and generates faults for voice-enabled devices.

Users can review a summary of faults on the Real-Time Dashboard (see the "Using the Real-Time Dashboard" section) and view the generated alarms on the Monitoring Console (see the "Using the Monitoring Console" section).

For additional overview information, see the following topics:

Device Types that Generate Faults

Contents of the Fault Tables

Device Types that Generate Faults

VHM generates faults for the voice device groups that comprise the following types of voice-enabled devices:

Voice Clusters

Media Servers

ICS 7750

Voice Gateways—VHM obtains some of the event information for voice gateways from DFM.

Phone Access Switches—VHM obtains some of the event information for Phone Access Switches from DFM.

Voice Mail Gateways

Monitored Phones

Contents of the Fault Tables

Fault tables include the following types of information:

Managed entity type where faults can be detected.

Faults that are detected.

User-configurable thresholds that are used to define the tolerance limits of each fault condition.

Exceptions raised by VHM when a fault condition exceeds thresholds.

Media Server Faults

Table 3-1 lists the media server faults diagnosed by VHM.


Note IBM environment attributes (temp, fan, and power supply) are not supported.


Table 3-1 Media Server Faults  

Managed Entity
Faults
Thresholds
Notification

Media Convergence Server (MCS)

Unresponsive

SNMP Agent Not Responding

 

Operational exception

High Processor Utilization

Insufficient Free Hard Disk Space

Insufficient Free Physical Memory

Insufficient Free Virtual Memory

ProcessUtilitizationThreshold

FreeHardDiskThreshold

FreePhysicalMemoryThreshold

FreeVirtualMemoryThreshold

Resource exception

Media Convergence Server (continued)

Temperature High

Temperature Sensor Down

Temperature Sensor Degraded

Fan Down

Fan Degraded

TemperatureCelsiusThreshold

Temperature exception

Power Supply Down

Power Supply Degraded

 

Power supply exception

Interface

Interface Operationally Down

 

Operational exception

Application
(Cisco CallManager, Workflow Application, Database Server, Conference Bridge, TFTP Server)

Transaction Failed

 

Application monitor exception

Too Many Failed Synthetic Transactions

FailureThreshold

Application monitor exception

Application Down

 

Application Exception

Cisco CallManager

CallManager Down

 

Application exception

Cisco CallManager (release 3.1 and 3.2 only)

Discovery Failed

   

Cisco CallManager (release 3.2 only)

TooManySuspectPhones

   

Remote Insight Board (RIB)

Applies to MCS-7830 models only.

Battery Low

Battery Failed

Battery Disconnected

BatteryPercentChargedThreshold

Power supply exception


Media server faults are described in more detail, grouped by notification type:

Operational Exceptions

Resource Exceptions

Temperature Exceptions

Power Supply Exceptions

Application Exceptions

Application Monitor Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

Unresponsive—The device is unreachable from the DFM server.

Interface Operationally Down—The interface is nonoperational.

Resource Exceptions

VHM generates a resource exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

High Processor Utilization—The processor utilization exceeds the threshold. ProcessUtilizationThreshold defines the upper limit for CPU utilization and is expressed as a percentage of total CPU capacity.

Insufficient Free Hard Disk Space—The free disk space is less than the low free disk space threshold (FreeHardDiskThreshold).

Insufficient Free Physical Memory—The system is running out of memory resources, and the threshold value is less than the FreePhysicalMemoryThreshold value.

Insufficient Free Virtual Memory—The system is running out of virtual memory resources, and the threshold value is less than the FreeVirtualMemoryThreshold value.

Temperature Exceptions

VHM generates a temperature exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

System Temperature Sensor Down/Degraded—The temperature sensor is reporting abnormal temperature measurements. Possible conditions are OK, Degraded, and Failed.

Temperature High—The operating temperature is higher than the threshold.

Fan Down/Degraded—The system fan condition is not normal. The possible conditions are OK, Degraded, and Failed.

Power Supply Exceptions

VHM generates a power supply exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

Battery Low/Failed—Remote Insight Board battery status is not normal. It is either Low or Failed.

Battery Disconnected—Remote Insight Board battery is disconnected.

Power Supply Down/Degraded—The power supply is not in a normal state. The possible states are OK, Degraded, and Failed.

Application Exceptions

VHM generates an application exception for multiple occurrences of the following fault (for a complete list of media server faults, see Table 3-1):

CallManager Down—Cisco CallManager is not running.

Application Down—Application is not running.

Application Monitor Exceptions

VHM generates an application exception for multiple occurrences of the following fault (for a complete list of media server faults, see Table 3-1):

Transaction Failed—Synthetic Transactions on this application were unsuccessful.

ICS 7750 Faults

Table 3-2 lists the ICS 7750 faults diagnosed by VHM.

Table 3-2 ICS 7750 Faults 

Managed Entity
Faults
Thresholds
Notification

ICS 7750

Unresponsive

 

Operational exception

Power Supply Down

 

Power supply exception

Fan Down

 

Temperature exception

SPEs

CallManager Down

 

Application exception

Insufficient Free Disk Space

Insufficient Free Virtual Memory

FreeHardDiskThreshold

FreeVirtualMemoryThreshold

Resource exception

Unresponsive

 

Operational exception

Multiservice Route Processor (MRP)

High Utilization

Insufficient Free Memory

ProcessUtilizationThreshold

FreePhysicalMemoryThreshold

Resource exception

Interface Operationally Down

Unresponsive

 

Operational exception

System Switch Processor (SSP)

High Utilization

Insufficient Free Memory

ProcessUtilizationThreshold

FreePhysicalMemoryThreshold

 

Interface Operationally Down

Unresponsive

 

Operational exception


ICS 7750 faults are described in more detail, grouped by notification type, in the following:

Operational Exceptions

Resource Exceptions

Power Supply Exceptions

Application Exceptions

Temperature Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of ICS 7750 faults, see Table 3-2):

Unresponsive—If one of the ICS 7750 entities (for example, the media server, trunk card, or BPS) is down, a fault is generated.

Interface Operationally Down—Status of Interface is down.

Resource Exceptions

VHM generates a resource exception for multiple occurrences of the following faults (for a complete list of ICS 7750 faults, see Table 3-2):

High Utilization—The processor utilization exceeds the processor utilization threshold. ProcessUtilizationThreshold defines the upper limit for CPU utilization and is expressed as a percentage of total CPU capacity.

Insufficient Free Disk Space—The free disk space is less than the low free disk space threshold (FreeHardDiskThreshold).

Insufficient Free Memory—The system is running out of memory resources, and the threshold value is less than FreePhysicalMemoryThreshold.

Insufficient Free Virtual Memory—The system is running out of virtual memory resources, and the threshold value is less than FreeVirtualMemoryThreshold.

Power Supply Exceptions

VHM generates a power supply exception for multiple occurrences of the following faults (for a complete list of ICS 7750 faults, see Table 3-2):

Power Supply State Down/Degraded—The power supply is not in a normal state. The possible states are OK, Degraded, and Failed.

Application Exceptions

VHM generates an application exception for multiple occurrences of the following fault (for a complete list of ICS 7750 faults, see Table 3-2):

CallManager Down—Application is not running.

Temperature Exceptions

VHM generates an application exception for multiple occurrences of the following fault (for a complete list of ICS 7750 faults, see Table 3-2):

Fan Down/Degraded—The system fan condition is not normal. The possible conditions are OK, Degraded, and Failed.

Voice Gateway and Phone Access Switch Faults

Table 3-3 displays the Voice Gateway and Phone Access Switch Faults diagnosed by DFM and further processed by VHM.

.

Table 3-3 Voice Gateway and Phone Access Switch Faults  

Managed Entities
Faults
Thresholds
Notification

Digital Voice Gateway

Interface Operationally Down

 

Operational exception

Lost Contact with Cluster

 

Connectivity exception

Voice Gateway

Phone Access Switch

Unresponsive

Unresponsive (SNMP agent)

Voice Port Operationally Down

Interface Operationally Down

Voice Port Administratively Down

Interface Administratively Down

Phone Removed

Card Down

 

Operational exception

High Utilization (CPU)

Insufficient Free Memory

ProcessUtilizationThreshold

FreePhysicalMemoryThreshold

Resource exception

Voice Gateway

Phone Access Switch

(continued)

Temperature Sensor Degraded

Temperature Sensor Down

Fan Down

Fan Degraded

 

Temperature exception

Power Supply Degraded

Power Supply Down

 

Power supply exception

Port Lost Contact with Cluster

Gateway Lost Contact with Cluster

Voice Interface Lost Contact with Cluster

Voice Card Lost Contact with Cluster

 

Connectivity exception


Voice Gateway and Phone Access Switch faults are described in more detail, grouped by notification type, in the following:

Operational Exceptions

Resource Exceptions

Temperature Exceptions

Power Supply Exceptions

Connectivity Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Phone Access Switch faults, see Table 3-3):

Unresponsive—The device is not reachable. The ICMP pings sent by the VHM server timed out without responding.

Unresponsive (SNMP agent)—The device is not responding to SNMP requests. ICMP pings are OK, but SNMP requests are timed out.

Interface Operationally Down—A voice interface is down.

Interface Administratively Down—A voice interface is down.

Voice Port Operationally Down—A voice port is down.

Voice Port Administratively Down—A voice port is down.

Phone Removed—IP phone lost network connection to the switch. This fault occurs only during rediscovery of the switch (through either manual rediscovery or nightly inventory collection).

Card Down—A voice card is down.

Resource Exceptions

VHM generates a resource exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Phone Access Switch faults, see Table 3-3):

High Utilization—The processor utilization exceeds the CPU utilization threshold. ProcessUtilizationThreshold defines the upper limit for CPU utilization and is expressed as a percentage of total CPU capacity.

Insufficient Free Memory—The system is running out of memory resources and the threshold value is less than the FreePhysicalMemoryThreshold value.

Temperature Exceptions

VHM generates a temperature exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Phone Access Switch faults, see Table 3-3):

Temperature Sensor Degraded—The temperature sensor condition is Degraded.

Temperature Sensor Down—The temperature sensor condition is Failed.

Fan Degraded—The fan condition is Degraded.

Fan Down—The fan condition is Failed.

Power Supply Exceptions

VHM generates a power supply exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Phone Access Switch faults, see Table 3-3):

Power Supply Degraded—The power supply is not in a normal state. The state is Degraded.

Power Supply Down—The power supply is not in a normal state. The state is Down.

Connectivity Exceptions

VHM generates a connectivity exception for multiple occurrences of the following fault (for a complete list of Voice Gateway and Phone Access Switch faults, see Table 3-3):

Lost Contact with Cluster—A digital voice interface lost registration with a Cisco CallManager cluster.

Port Lost Contact with Cluster—A voice port lost registration with a Cisco CallManager cluster.

Gateway Lost Contact with Cluster—A voice gateway lost registration with a Cisco CallManager cluster.

Voice Interface Lost Contact with Cluster—A voice interface lost registration with a Cisco CallManager cluster.

Voice Card Lost Contact with Cluster—A voice card lost registration with a Cisco CallManager cluster.

Voice Mail Gateway Faults

Table 3-4 displays the Voice Mail Gateway faults diagnosed by DFM and further processed by VHM.

Table 3-4 Voice Mail Gateway Faults  

Managed Entities
Faults
Thresholds
Notification

Voice Mail Gateways

Unresponsive

Interface Operationally Down

Interface Administratively Down

 

Operational exception

Port Lost Contact with Cluster

 

Connectivity exception

DPA Port CallManager Link Down

 

DPA CallManager link exception

DPA Port Telephony Link Down

 

DPA telephony link exception

High Utilization (CPU)

ProcessUtilizationThreshold

Resource exception


Voice Mail Gateway faults are described in more detail, grouped by notification type, in the following:

Operational Exceptions

Resource Exceptions

Connectivity Exceptions

Other Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of Voice Mail Gateway faults, see Table 3-4):

Unresponsive—The device is not reachable.

Interface Operationally Down—A voice interface is down.

Interface Administratively Down—A voice interface is down.

Resource Exceptions

VHM generates a resource exception for multiple occurrences of the following faults (for a complete list of Voice Mail Gateway faults, see Table 3-4):

High Utilization—The processor utilization exceeds the CPU utilization threshold. ProcessUtilizationThreshold defines the upper limit for CPU utilization and is expressed as a percentage of total CPU capacity.

Connectivity Exceptions

VHM generates a connectivity exception for multiple occurrences of the following fault (for a complete list of Voice Mail Gateway faults, see Table 3-4):

Port Lost Contact with Cluster—The DPA port lost contact with the cluster.

Other Exceptions

VHM generates exceptions for multiple occurrences of the following faults (for a complete list of Voice Mail Gateway faults, see Table 3-4):

DPA Port CallManager Link Down—There is no connectivity between the DPA port and the CallManager.

DPA Port Telephony Link Down—There is no connectivity between the DPA port and the Octel voice mail.

Monitored Phone Faults

Table 3-5 displays the Monitored Phone faults diagnosed by DFM and further processed by VHM.

Table 3-5 Monitored Phone Faults  

Managed Entities
Faults
Thresholds
Notification

Monitored Phones

Unresponsive

 

Operational exception

Monitored Phone Lost Contact with Cluster

 

Connectivity exception

Extension Number Removed

   

Phone Discovery Error

   

Monitored Phone faults are described in more detail, grouped by notification type, in the following:

Operational Exceptions

Connectivity Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of Monitored Phone faults, see Table 3-5):

Unresponsive—The phone is not reachable. The ICMP pings sent by the VHM server timed out without responding.

Connectivity Exceptions

VHM generates a connectivity exception for multiple occurrences of the following fault (for a complete list of Monitored Phone faults, see Table 3-5):

Monitored Phone Lost Contact with Cluster—The monitored phone lost contact with all Cisco CallManagers in the cluster.

Gatekeeper Fault

Table 3-6 displays the gatekeeper fault diagnosed by DFM and further processed by VHM.

Table 3-6 Gatekeeper Faults  

Managed Entities
Faults
Thresholds
Notification

Gatekeeper

Gatekeeper Lost Contact with Cluster

 

Connectivity exception


Connectivity Exceptions

VHM generates a connectivity exception for multiple occurrences of the following fault:

Gatekeeper Lost Contact with Cluster—Gatekeeper lost registration with the Cisco CallManager cluster.

Suspect Phone Fault

Table 3-7 displays the suspect phone fault diagnosed by DFM and further processed by VHM.

Table 3-7 Suspect Phone Faults  

Managed Entities
Faults
Thresholds
Notification

Suspect Phone

Suspect Phone Detected

 

Operational exception


Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following fault:

Suspect Phone Detected—The phone cannot register to a Cisco CallManager.

Voice Cluster Faults

Table 3-8 displays voice cluster faults diagnosed by DFM and further processed by VHM.

Table 3-8 Voice Cluster Faults  

Managed Entities
Faults
Thresholds
Notification

Voice Cluster

Too Many Inactive Phones

InactivePhoneThreshold

Operational exception

CCM HTTP Service Down

   

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults:

Too Many Inactive Phones—The number of inactive phones exceeds the phone threshold. InactivePhoneThreshold is expressed as a percentage of the total phones connected to a Cisco CallManager cluster.

CCM HTTP Service Down—VHM cannot use HTTP service to communicate to all Cisco CallManagers in the cluster.

General VHM Faults

The following are general faults that VHM displays:

DFM Server Down—VHM lost contact with the DFM server.

Synthetic Transaction Server Down—VHM lost contact with the Synthetic Transaction server.

ESS Connectivity Lost—VHM cannot communicate with the ESS bus.

VHM Domain Connectivity Lost—VHM lost contact with the domain.

Discovery Error—Discovery did not complete.

Pass-Through Traps

Table 3-9 lists the pass-through traps that VHM processes for Cisco CallManager.

Table 3-9 Pass-Through Traps—Cisco CallManager  

Pass-Through Trap
Description

CCMGateWayFailedException

A gateway has failed in its attempted to register or communicate with a Cisco CallManager.

CCMMediaResourceListExhaustedException

Cisco CallManager has run out of resources.

CCMCallManagerFailedException

Cisco CallManager detects a failure in one of its critical subsytems.

CCMGatewayLayer2ChangeException

The D-Channel/Layer 2 of an interface in a digital gateway that is registered with Cisco CallManager changes state.


Table 3-10 lists the pass-through traps that VHM processes for Media Servers (IBM systems).

Table 3-10 Pass-Through Traps—Media Servers (IBM systems)  

Pass-Through Trap
Description

IBMFanEventException

A fan is down.

IBMVoltageEventException

The voltage is not correct.

IBMTemperatureEventException

The temperature is high.


Table 3-11 lists the pass-through traps that VHM processes for voice services.

Table 3-11 Pass-Through Traps—Voice Services  

Pass-Through Trap
Description

VoiceServiceModuleStopException

An application module or subsystem has stopped.

VoiceServiceModuleStartException

An application module or subsystem has successfully started and has transitioned to in-service state.

VoiceServiceRunTimeFailureException

A run time failure has occured.

VoiceServiceProcessStartException

A process has just started.

VoiceServiceProcessStopException

A process has just stopped.