Voice Health Monitor User Guide, 1.0
Voice Faults and Exceptions

Table Of Contents

Voice Faults and Exceptions

Overview of Faults and Exceptions

Device Types that Generate Faults

Contents of the Fault Tables

Media Server Faults

ICS 7750 Faults

Voice Gateway and Inline Power Switch Faults


Voice Faults and Exceptions


A fault is an abnormal condition that occurs when a system or a system component violates a performance threshold or is not functioning properly. An exception is a group of related faults.

VHM groups related faults into a single exception. That is, it generates a single exception of a given type per device, regardless of the number of faults that exist.

The following topics are discussed:

Overview of Faults and Exceptions

Media Server Faults

ICS 7750 Faults

Voice Gateway and Inline Power Switch Faults

Overview of Faults and Exceptions

By polling SNMP MIBs and subscribing to voice-related events received by DFM, VHM obtains event information to analyze, and generates faults for voice-enabled devices.

Users can review a summary of faults on the Real-Time Dashboard (see the "Using the Real-Time Dashboard" section) and view the generated alarms on the Monitoring Console (see the "Using the Monitoring Console" section).

For additional overview information, see the following topics:

Device Types that Generate Faults

Contents of the Fault Tables

Device Types that Generate Faults

VHM generates faults for the voice device groups that comprise the following types of voice-enabled devices:

Voice Clusters.

Media Servers.

ICS 7750.

Voice Gateways—VHM obtains some of the event information for voice gateways from DFM.

Inline Power Switches—VHM obtains some of the event information for Inline Power Switches from DFM.

Contents of the Fault Tables

Fault tables include the following types of information:

Managed entity type where faults can be detected.

Faults that are detected.

User-configurable thresholds that are used to define the tolerance limits of each fault condition.

Exceptions raised by VHM when a fault condition exceeds thresholds.

Media Server Faults

Table 3-1 lists the media server faults diagnosed by VHM.

Table 3-1 Media Server Faults  

Managed Entity
Faults
Thresholds
Notification

Media Convergence Server (MCS)

Unreachable

SNMP Agent Not Responding

 

Operational exception

High CPU Utilization

Low Free Disk Space

Insufficient Free Physical Memory

Insufficient Free Virtual Memory

ProcessUtilitizationThreshold

FreeHardDiskThreshold

FreePhysicalMemoryThreshold


FreeVirtualMemoryThreshold

Resource exception

System Temperature High

System Fan State Down

CPU Fan Status Down

TemperatureCelsiusThreshold

Temperature exception

Power Supply Status Down

Power Supply Status Degraded

 

Power supply exception

Interface

Interface Down

 

Operational exception

Voice Cluster

Too Many Inactive Phones

InActivePhoneThreshold

Operational exception

Application
(Cisco CallManager, Workflow Application, Database Server, Conference Bridge, TFTP Server)

Application Down

 

Application exception

CallManager

Synthetic Phone Registration Failed

Synthetic Off Hook Transaction Failed

Synthetic Empty Call Failed

Synthetic End to End Call Failed

Synthetic TFTP Download Failed

Synthetic Phone to Gateway Call Failed

Synthetic Phone to Bridge Call Failed

Synthetic Transaction Server Down

 

Application monitor exception

Remote Insight Board (RIB)

RIB Battery Status Down

RIB Battery Status Degraded

RIB Battery Charge
Status Down

BatteryPercentChargedThreshold

Power supply exception

Voice Services

Synthetic Transaction Failed

 

Application monitor exception

Voice Cluster

Too many inactive phones

InActivePhoneThreshold

Operational exception


Media server faults are described in more detail, grouped by notification type:

Operational Exceptions

Resource Exceptions

Temperature Exceptions

Power Supply Exceptions

Application Exceptions

Application Monitor Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

Unreachable—The device is unreachable from the DFM server.

SNMP Agent Not Responding—The device is not responding to SNMP requests. ICMP pings are OK, but SNMP requests are timed out.

Interface Down—The interface is nonoperational.

Too Many InActive phones—The percentage of inactive phones on a CCM exceeds the InActivePhoneThreshold value.

Resource Exceptions

VHM generates a resource exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

High CPU Utilization—The processor utilization exceeds the processor utilization threshold. ProcessUtilizationThreshold defines the upper limit for CPU utilization and is expressed as a percentage of total CPU capacity.

Insufficient Free Disk Space—The free disk space is less than the low free disk space threshold (FreeHardDiskThreshold).

Insufficient Free Physical Memory—The system is running out of memory resources, and the threshold value is less than the FreePhysicalMemoryThreshold value.

Insufficient Free Virtual Memory—The system is running out of virtual memory resources, and the threshold value is less than the FreeVirtualMemoryThreshold value.

Temperature Exceptions

VHM generates a temperature exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

System Temperature Server Down/Degraded—The temperature sensor is reporting abnormal temperature measurements. Possible conditions are OK, Degraded, and Failed.

Temperature High—The operating temperature is higher than the threshold.

System Fan State Down/Degraded—The system fan condition is not normal. The possible conditions are OK, Degraded, and Failed.

CPU Fan Status Down/Degraded—The CPU fan condition is not normal. The possible conditions are OK, Degraded, and Failed.

Power Supply Exceptions

VHM generates a power supply exception for multiple occurrences of the following faults (for a complete list of media server faults, see Table 3-1):

System Battery State is Down/Degraded—Remote Insight Board battery status is not normal. It is either Degrading or Failed.

RIB Battery Status Down/Degraded—Remote Insight Board battery charge is not normal. It is either Degrading or Failed.

RIB Battery Charge Status Down/Degraded—Remote Insight Board battery charge (for MCS-7830 with RIB card) is not normal. It is degrading or failed.

Power Supply State Down/Degraded—The power supply is not in a normal state. The possible states are OK, Degraded, and Failed.

Application Exceptions

VHM generates an application exception for multiple occurrences of the following fault (for a complete list of media server faults, see Table 3-1):

Application Down—Application is not running.

Application Monitor Exceptions

VHM generates an application exception for multiple occurrences of the following fault (for a complete list of media server faults, see Table 3-1):

Synthetic Phone Registration Failed—Phone registration test failed during a synthetic transaction for a Cisco CallManager (CCM).

Synthetic Off Hook Transaction Failed—Off-hook test failed during a synthetic transaction for a CCM.

Synthetic Empty Call Failed—Empty call test failed during a synthetic transaction for a CCM.

Synthetic End to End Call Failed—End-to-end call test failed during a synthetic transaction for a CCM.

Synthetic TFTP Download Failed—TFTP download test failed during a synthetic transaction for a CCM.

Synthetic Phone to Gateway Call Failed—No device picked up a test call to a gateway during a synthetic transaction for a CCM.

Synthetic Phone to Bridge Call Failed—No device picked up a test call to a conference bridge during a synthetic transaction for a CCM.

Synthetic Transaction Server Down—The synthetic transaction server is down. Synthetic transaction tests are not being run.

ICS 7750 Faults

Table 3-2 lists the ICS 7750 faults diagnosed by VHM.

Table 3-2 ICS 7750 Faults 

Managed Entity
Faults
Thresholds
Notification

ICS 7750

Unresponsive

 

Operational exception

Power Supply Down

 

Power supply exception

Primary SPE/Secondary SPE/Application

SPE

CallManager Down

 

Application exception

Insufficient Free Disk Space

Insufficient Free Physical Memory

Insufficient Free Virtual Memory

Free Hard Disk Threshold

Free Virtual Memory Threshold

Resource exception

Power Supply State Down

   

Interface Operationally Down

 

Operational exception

Unresponsive

   

Multiservice Route Processor (MRP)

High Utilization

Unresponsive

Insufficient Free Physical Memory

ProcessUtilizationThreshold

FreePhysicalMemoryThreshold

Resource exception

Interface Operationally Down

 

Operational exception

System Switch Processor

High Utilization

Insufficient Free Memory

Unresponsive

ProcessUtilizationThreshold

FreePhysicalMemoryThreshold

 

ICS 7750 faults are described in more detail, grouped by notification type, in the following:

Operational Exceptions

Resource Exceptions

Power Supply Exceptions

Application Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of ICS 7750 faults, see Table 3-2):

Unresponsive—If one of the ICS 7750 entities (for example, the media server, trunk card, or BPS) is down, a fault is generated.

Interface Operationally Down—Status of Interface is down.

Resource Exceptions

VHM generates a resource exception for multiple occurrences of the following faults (for a complete list of ICS 7750 faults, see Table 3-2):

High Utilization—The processor utilization exceeds the processor utilization threshold. ProcessUtilizationThreshold defines the upper limit for CPU utilization and is expressed as a percentage of total CPU capacity.

Insufficient Free Disk Space—The free disk space is less than the low free disk space threshold (FreeHardDiskThreshold).

Insufficient Free Physical Memory—The system is running out of memory resources, and the threshold value is less than FreePhysicalMemoryThreshold.

Insufficient Free Virtual Memory—The system is running out of virtual memory resources, and the threshold value is less than FreeVirtualMemoryThreshold.

Power Supply Exceptions

VHM generates a power supply exception for multiple occurrences of the following faults (for a complete list of ICS 7750 faults, see Table 3-2):

Power Supply State Down/Degraded—The power supply is not in a normal state. The possible states are OK, Degraded, and Failed.

Application Exceptions

VHM generates an application exception for multiple occurrences of the following fault (for a complete list of ICS 7750 faults, see Table 3-2):

CallManager Down—Application is not running.

Voice Gateway and Inline Power Switch Faults

Table 3-3 displays the Voice Gateway and Inline Power Switch Faults diagnosed by DFM and further processed by VHM.

.

Table 3-3 Voice Gateway and Inline Power Switch Faults  

Managed Entities
Faults
Thresholds
Notification

Skinny Voice Gateway

Interface Operationally Down

 

Operational exception

Lost Contact with CallManager

 

Connectivity exception

Voice Gateway

Inline Power Switch

Unresponsive

SNMP Agent Unresponsive

Voice Port Operationally Down

Interface Operationally Down

 

Operational exception

Voice Port Administratively Down

Interface Administratively Down

   

High Utilization (CPU)

Insufficient Free Memory

ProcessUtilizationThreshold

FreePhysicalMemoryThreshold

Resource exception

Temperature Sensor Degraded

Temperature Sensor Down

Fan Down

Fan Degraded

 

Temperature exception

Power Supply Degraded

Power Supply Down

 

Power supply exception


Voice Gateway and Inline Power Switch faults are described in more detail, grouped by notification type, in the following:

Operational Exceptions

Resource Exceptions

Temperature Exceptions

Power Supply Exceptions

Connectivity Exceptions

Operational Exceptions

VHM generates an operational exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Inline Power Switch faults, see Table 3-3):

Unresponsive—The device is not reachable. The ICMP pings sent by the VHM server timed out without responding.

SNMP Agent Unresponsive—The device is not responding to SNMP requests. ICMP pings are OK, but SNMP requests are timed out.

Interface Operationally Down—A voice interface is down.

Interface Administratively Down—A voice interface is down.

Voice Port Operationally Down—A voice port is down.

Voice Port Administratively Down—A voice port is down.

Resource Exceptions

VHM generates a resource exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Inline Power Switch faults, see Table 3-3):

High Utilization—The processor utilization exceeds the CPU utilization threshold. ProcessUtilizationThreshold defines the upper limit for CPU utilization and is expressed as a percentage of total CPU capacity.

Insufficient Free Memory—The system is running out of memory resources and the threshold value is less than the FreePhysicalMemoryThreshold value.

Temperature Exceptions

VHM generates a temperature exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Inline Power Switch faults, see Table 3-3):

Temperature Sensor Degraded—The temperature sensor condition is Degraded.

Temperature Sensor Down—The temperature sensor condition is Failed.

Fan Degraded—The fan condition is Degraded.

Fan Down—The fan condition is Failed.

Power Supply Exceptions

VHM generates a power supply exception for multiple occurrences of the following faults (for a complete list of Voice Gateway and Inline Power Switch faults, see Table 3-3):

Power Supply Degraded—The power supply is not in a normal state. The state is Degraded.

Power Supply Down—The power supply is not in a normal state. The state is Down.

Connectivity Exceptions

VHM generates a connectivity exception for multiple occurrences of the following fault (for a complete list of Voice Gateway and Inline Power Switch faults, see Table 3-3):

Lost Contact with CallManager—Skinny Voice Interface lost contact with CCM.