Prime Central for Hosted Collaboration Solution 9.2.1 User Guide
Appendix A: Understanding Common Service Faults
Downloads: This chapterpdf (PDF - 858.0KB) The complete bookPDF (PDF - 3.19MB) | Feedback

Understanding Common Service Faults

Table Of Contents

Understanding Common Service Faults

RCA Correlation Tree

Common Failures

UC1 - VMware ESXi Host Failure - CUCM

UC2 - VMware ESXi Host Failure - CUCxn

UC3 - UCS Blade Failure - CUCM

UC4 - UCS Blade Failure - CUCxn

UC5 - Application Cold Failure - CUCM

UC6 - Application Cold Failure - CUCxn

UC7 - Changing the Number of Registered Gateways and Media Devices

UC8 - TFTP Server for UC Services - Critical Processes Failure

UC9 - Detecting and Correlating Customer Voice Quality Degradation

UC11 - VMware VM Failure - CUCM

UC12 - CUCM Clustering Problems

UC13 - Change in Number of Registered Phones

UC15 - CUCxn Critical Process Failure

UC16 - VMware VM Failure - CUCxn

UC17 - CUCxn Clustering Problems

UC18 - CUCM Critical Process Failure

UC19 - UCS Chassis Failure - CUCM

UC20 - UCS Chassis Failure - CUCxn

UC21 - Insufficient Virtual Memory

UC22 - CPU Utilization Problems

UC23 - Call Throttling Failures (Code Red)

UC24 - Call Throttling Failures (Code Yellow)

UC25 - Route List Exhausted

UC26 - Media List Exhausted

UC27 - High Resource Utilization by all Customer Sites

UC28 - Memory, CPU, Disk Threshold Exceeded - CUCxn

UC29 - Low Number Of Available Licenses - CUCxn

UC30 - VM Resources - Memory

UC31 - VM Resources - CPU

UC32 - VM Resources - Disk usage

UC33 - VM Resources - CPU ready time

UC34 - VM Resources - Disk latency

UC35 - ASR1K - Chassis Failure

UC36 - ASR1K - Power Supply/Fan Failure

UC37 - ASR1K - RP/ES/SPA Failure

UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk

UC39 - CUBE-SP Adjacency Status

UC40 - Voice Quality Degradation

UC41 - CUBE - SP Security Violation

UC42 - CUBE-SP Resource Performance Degradation

UC43 - CUBE-SP SLA Violation

CP1 - CUCMIP Critical Processes Failure

CP2 - Application Cold Failure - CUCMIP

CP3 - VMware VM Failure - CUCMIP

CP4 - CUCMIP VMware ESXi Host Failure

CP5 - CUCMIP UCS Blade Failure

CP6 - CUCMIP UCS Chassis Failure

CP7 - Application Resources Degradation - CUCMIP

CP11 - IM Resources Exceeded - CUCMIP


Understanding Common Service Faults


Service Impact Analysis (SIA) function is performed by Service Models defined within Prime Central for HCS-SV server. SV server however relies on event enrichment function which is performed primarily by CE server in conjunction with HCM-Fulfillment SDR. The following are the terms used in this chapter:

Service—Voice/VoiceMail/Presence are types of services deployed by HCS customers.

Impact—Service state could be Up/Down/Marginal. If the state is Marginal, it indicates that the service is at risk (for example, CUCM node goes down, phones register with standby node so service remains up but is considered at risk.) The Prime Central for HCS SIA takes the following into consideration:

The level of application-level redundancy deployed by customer and fault location. For example, customer-specific components as opposed to common components.

The initial service state. For example, the first VM failure could change the service state from UP to Marginal. Subsequent VM failure for same customer could change the service state from Marginal to Down if the initial problem is not addressed.

Service dependency. For example, VoiceMail and Presence services rely on Voice services provided by CUCM; so if Voice service goes Marginal/Down, it has the potential to affect VoiceMail and Presence services as well.

Scope—Scope defines the level of impact for a single fault which could affect single device user/ location/ customer/multiple customers using same blade or chassis/Data Center. Prime Central for HCS SIA is currently limited to customer level. Support for further info such as Location, User, Device level are under consideration for future releases.

Table A-1 lists a few typical faults with associated impact and scope.

Table A-1 Typical Faults in HCS System

Type of fault
Services affected
Impact
Scope
Notes

Process failure

Related service hosted by VM with process failure

Marginal/Down

Customer

Deploy application level redundancy

VM failure

Related service hosted by affected VM

Marginal/Down

Customer

Deploy application level redundancy/ VMware HA to minimize impact

Host failure

Related services hosted by all VMs deployed on host

Marginal/Down

Multiple customers

Deploy VMware HA/distribute Apps for a given customer across different hosts to minimize impact

Blade failure

Related services hosted by all VMs deployed on host

Marginal / Down

Multiple customers

Deploy VMware HA/distribute Apps for a given customer across different to minimize impact

Chassis failure

Related services hosted by all VMs deployed on Chassis

Marginal / Down

Multiple customers

Deploy VMware

HA cluster across multiple chassis

to minimize

impact

CPE router failure

All services on location

Marginal / Down

Customer Location

(in a future release)

Deploy SRST/ redundant connectivity options -SIA not currently supported by Prime Central for HCS

WAN connectivity failure

All services on location

Marginal/Down

Customer Location

(in a future release)

Deploy SRST/ redundant connectivity options - SIA not currently supported by Prime Central for HCS

CUBE failure

Offnet services

Marginal/Down

Multiple

Customers

Deploy CUBE HA - SIA support is a in Prime Central for HCS 9.2.1


RCA Correlation Tree

This section explains the generic RCA correlation tree that applies to the failure of the scenarios listed below:

For example, when a UCS Chassis failure occurs, the UCS Chassis failure event is marked as the root cause. UCS Blade failure events correlate to the UCS Chassis failure events, ESXi host failure events correlate to UCS Blade failure events and so on.

Note that it takes a few minutes for the correlation tree to converge. This is because the correlation tree is computed and updated as events arrive. For example, if the VM failure event is seen first, before the ESXi host failure events, then the VM failure events are first marked as a root cause. When the ESXi host events are seen later, the ESXi host events are marked as root causes and the VM failure events are remarked as symptoms.

Common Failures

This section documents the use cases (UC) of events observed in Prime Central for HCS during common service faults.

There are many ways to trigger the same event. The exact events Prime Central for HCS received may vary depending on how the fault is triggered and on the environment. The examples below illustrate how to use Prime Central for HCS to identify root cause and/or service impact events for specified faults triggered. This section contains the following topics:

UC1 - VMware ESXi Host Failure - CUCM

UC2 - VMware ESXi Host Failure - CUCxn

UC3 - UCS Blade Failure - CUCM

UC4 - UCS Blade Failure - CUCxn

UC5 - Application Cold Failure - CUCM

UC6 - Application Cold Failure - CUCxn

UC7 - Changing the Number of Registered Gateways and Media Devices

UC8 - TFTP Server for UC Services - Critical Processes Failure

UC8 - TFTP Server for UC Services - Critical Processes Failure

UC9 - Detecting and Correlating Customer Voice Quality Degradation

UC11 - VMware VM Failure - CUCM

UC12 - CUCM Clustering Problems

UC13 - Change in Number of Registered Phones

UC15 - CUCxn Critical Process Failure

UC16 - VMware VM Failure - CUCxn

UC17 - CUCxn Clustering Problems

UC18 - CUCM Critical Process Failure

UC19 - UCS Chassis Failure - CUCM

UC20 - UCS Chassis Failure - CUCxn

UC21 - Insufficient Virtual Memory

UC22 - CPU Utilization Problems

UC23 - Call Throttling Failures (Code Red)

UC24 - Call Throttling Failures (Code Yellow)

UC25 - Route List Exhausted

UC26 - Media List Exhausted

UC27 - High Resource Utilization by all Customer Sites

UC28 - Memory, CPU, Disk Threshold Exceeded - CUCxn

UC29 - Low Number Of Available Licenses - CUCxn

UC30 - VM Resources - Memory

UC31 - VM Resources - CPU

UC32 - VM Resources - Disk usage

UC33 - VM Resources - CPU ready time

UC34 - VM Resources - Disk latency

UC35 - ASR1K - Chassis Failure

UC36 - ASR1K - Power Supply/Fan Failure

UC37 - ASR1K - RP/ES/SPA Failure

UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk

UC39 - CUBE-SP Adjacency Status

UC40 - Voice Quality Degradation

UC41 - CUBE - SP Security Violation

UC42 - CUBE-SP Resource Performance Degradation

UC43 - CUBE-SP SLA Violation

CP1 - CUCMIP Critical Processes Failure

CP2 - Application Cold Failure - CUCMIP

CP3 - VMware VM Failure - CUCMIP

CP4 - CUCMIP VMware ESXi Host Failure

CP5 - CUCMIP UCS Blade Failure

CP6 - CUCMIP UCS Chassis Failure

CP7 - Application Resources Degradation - CUCMIP

CP11 - IM Resources Exceeded - CUCMIP

UC1 - VMware ESXi Host Failure - CUCM

This use case describes the events that Prime Central for HCS receives if the VMware ESXi Host fails. This type of failure generates both Root Cause (RC) and Service Impact (SI) events. The CUCM VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the ESXi host is recovered.

Observed RC-EL Events

When the ESXi host shuts down, many synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_BladeLinks, OM_CUCM_Redundancy, and OM_CUCM_Registration. Eventually, there is only one primary synthetic RCA event and VC_Host_Avlblty along with the following two events:

OM_CUCM_Registration, which is triggered when the VM moves to the new ESXi Host.

UCS_Bladelinks, which is a sibling event of VC_Host_Avlblty in the correlation tree.

Table A-2 Observed Root Cause Events for UC1 

Severity
EventTypeID
Summary

Critical

VC_Host_Avlblty

Synthetic Event for VC_Host_Avlblty groupevents from 10.11.3.152

Warning

OM_CUCM_Registration

Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C071-1

Major

UCS_BladeLinks

Synthetic Event for UCS_BladeLinks group events from 10.11.2.8


Observed SI-EL Events

CUCM voice service impacts voice mail and presence.

Table A-3 Observed Service Events for UC1 

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal Observed Other Events Prime Central for HCS does not analyze these events, but they could point to potential


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.

Table A-4 Observed Other Events for UC 1

Severity (s)/Customer (C)/
Node (N)
EventName (EN)/EventTypeId (ET)
Summary

S = Warning

C = C071

N = CUCM-71-

pub

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCM-71-pub was restarted on 10.11.3.148 since 10.11.3.152 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.148:ESX ON 30522010

(Event_Type=VmRestartedOnAlternateHostEvent)\]

S = Major

N = 10.11.2.8

Link Down (vethernet1060)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 486548517)

Down, should be Up (ifEntry.486548517)

S = Major

N = 10.11.2.9

Link Down (vethernet1059)

S = Major

N = 10.11.2.8

Link Down (vethernet9254)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 486540323)

Down, should be Up (ifEntry.486540323)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 503317541)

Administratively Down (ifEntry.503317541)

S = Major

N = 10.11.2.9

Link Down (vethernet9253)

S = Minor

N= 10.11.2.9

Network Interface (ifIndex = 503317540)

Administratively Down (ifEntry.503317540)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486540322)

Down, should be Up (ifEntry.486540322)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486548516)

Down, should be Up (ifEntry.486548516)

S = Major

N = 10.11.2.8

Link Down (Ethernet5/1/2)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 520355904)

Down, should be Up (ifEntry.520355904)

S = Major

N = 10.11.2.9

Link Down (Ethernet5/1/2)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 520355904)

Down, should be Up (ifEntry.520355904)


Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view.

Table A-5 Observed Service Tree Events for UC1 

Location
Summary

... -> Voice Service

Meta event for Voice Service - C071

...-> Cluster_Availability-->

Internode_Trunks

SDL Link Out Of Service::Component=192.6.4.124-192.6.4.123; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.123; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < >;

...

-> CUCM-71-pub

PerformancePollingStopped::Component= cucm-71-pub.customer.com; Error Message String= 27-Jun-2012 16:23:59 EDT,cucm-71- pub.customer.com,192.6.4.123,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < >;

...

-> CUCM-71-pub

"DeviceRestarted::Component= cucm-71- pub.customer.com; Default Event Name= DeviceRestarted; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DeviceRestarted >; "

...

-> CUCM-71-sub

ServiceDown::Component= VS-cucm-71- sub.customer.com/Cisco DRF Local; ProductName= Cisco DRF Local; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= < >;

...

-> CUCM-71-pub --> VM Resources

The virtual machine guest memory usage is high on CUCM-71-pub. Message: KVM_VM_Guest_Memory_Util_High[(Guest_Util>40) ON VM:cisco-10.11.3.148:ESX ON CUCM-71- pub (Guest_Util=75)]

...

-> Call Control-->

Registration

Number Of Registered MediaDevices Increased::Component= VE-CUCM-CLC071- 1-RTMTSyslog-Id#1340828781555; Detail= Number of registered Media Devices increased in consecutive polls. Current monitored precanned object has increased by 3 The alert is generated on Wed Jun 27 16:26:22 EDT 2012 on cluster CUCM-CL-C071-1.] ClusterID=: RTMT Alert; Default Event Name= Number Of Registered MediaDevices Increased; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=NumberOfRegisteredMediaDevicesIncreased >;


Next Steps


Step 1 The VM on the host is automatically brought up in another host through HA.

Step 2 The original host is brought back using the following steps:

a. UCS Manager > Service Profiles > root > 862-10-c5b2 > Boot Server

b. Troubleshoot and resolve the ESXi Host issue.

c. Drag and drop CUCM-71-pub from the host that it moved to back to ESXi Host (10.11.3.152).

d. Clear any alarms on the CUCM VM.


UC2 - VMware ESXi Host Failure - CUCxn

This use case describes the events that Prime Central for HCS receives if the VMware ESXi host fails. This type of incident generates both Root Cause (RC) and Service Impact (SI) events. The CUCxn VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCxn nodes stay down until the ESXi Host is recovered.

Observed RC-EL Events

When the ESXi host shuts down, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_BladeLinks, and OM_CUCxn_OM_Connectivity. Eventually, there is only one primary synthetic RCA event of VC_Host_Avlblty.

Table A-6 Observed Root Cause Events for UC2

Severity
EventTypeID
Summary

Critical

VC_Host_Avlblty

Synthetic Event for VC_Host_Avlblty group events from 10.11.3.152

Major

UCS_BladeLinks

Synthetic Event for UCS_BladeLinks group events from 10.11.2.8


Observed SI-EL Events

CUCxn voice mail service is impacted.

Table A-7 Observed Service Events for UC2

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.


Note 10.11.2.8 and 10.11.2.9 are the IP addresses of UCS6140 side A and UCS6140 side B, respectively.


.

Table A-8 Observed Other Events for UC2 

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Warning

C = C071

N = CUCxn-71-pub

EN =KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCxn-71-pub

was restarted on 10.11.3.141 since

10.11.3.152 failed. Message:

KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.141:ESX

ON 31536476

(Event_Type=VmRestartedOnAlternateHostEvent)]

S = Major

N = 10.11.2.8

Link Down (vethernet1060)

S = Major

N = 10.11.2.8

Network Interface (ifIndex =

486540323) Down, should be Up

(ifEntry.486540323)

S = Major

N = 10.11.2.9

Link Down (vethernet1059)

S = Major

N = 10.11.2.8

Link Down (vethernet9254)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 486540323) Down, should be Up (ifEntry.486540323)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 503317541) Administratively Down (ifEntry.503317541)

S = Major

N = 10.11.2.9

Link Down (vethernet9253)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 503317540) Administratively Down (ifEntry.503317540)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486540322) Down, should be Up (ifEntry.486540322)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486548516) Down, should be Up (ifEntry.486548516)

S = Major

N = 10.11.2.9

Link Down (Ethernet5/1/2)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)

S = Major

N = 10.11.2.9

Link Down (Ethernet5/1/2)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)


Service Tree Event Overlay Location and Content

SIA events are overlain on the Service Tree in the Service Availability view.

Table A-9 Observed Service Tree Events for UC2

Location
Summary

...

-> CUCxn-71-pub

PerformancePollingStopped::Component= cucxn-71-pub.customer.com; Error Message String= 06- Jul-2012 10:20:43 EDT,cucxn-71- pub.customer.com,192.6.4.125,Cannot collect data. The device returned no data from a required

MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;


Next Steps


Step 1 The VM on the host is automatically brought up in another host through HA.

Step 2 The original host is brought back via following steps:

a. Select UCS Manager > Service Profiles > root > 862-10-c5b2 > Boot Server.

b. Troubleshoot and resolve the ESXi Host issue.

c. Drag and drop CUCxn-71-pub from the host that it moved to back to ESXi Host (10.11.3.152).

d. Clear any alarms on the CUCxn VM.


UC3 - UCS Blade Failure - CUCM

This fault generates both Root Cause (RC) and Service Impact (SI) events. The CUCM VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the UCS blade is replaced.

Observed RC-EL Events

When the UCS blade fails, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_Blade_Avlblty, UCS_BladeLinks, and OM_CUCM_OM_Connectivity. Eventually, there is only one primary synthetic RCA event of UCS_Blade_Avlblty.

Table A-10 Observed Root Cause Events for UC3

Severity
EventTypeID
Summary

Critical

UCS_Blade_Avlblty

Synthetic Event for UCS_Blade_Avlblty group events from 10.11.2.10


Observed SI-EL Events

CUCM voice service impacts voice mail and presence.

Table A-11 Service Events observed for UC3 - UCS Blade Failure - CUCM

Severity
Summary

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.


Other Events Observed:

These events are not currently being analyzed by Prime Central for HCS but could point to potential root causes for the impacted services.


Note 10.11.2.8, 10.11.2.9, and 10.11.2.10 are the IP address of UCS6140 side A, UCS6140 side B, and UCSM, respectively.


Table A-12 Other Events Observed for UC3 - UCS Blade Failure - CUCM  

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Warning

C = C071

N = CUCM-71-pub

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCM-71-pub was restarted on 10.11.3.141 since 10.11.3.152 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco \[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.141:ESX ON 31551556 (Event_Type=VmRestartedOnAlternateHostEvent)\]

S = Indeterminate

N = 10.11.2.10

EN = fltEquipmentFanPerfThresholdNonCritical

ET = default

Fan 2 in Fan Module

3/1-1 speed: upper-noncritical(

FaultCode:fltEquipmentFanPerfThresholdNonCritical

S = Major

N = 10.11.2.8

Link Down (vethernet1060)

S = Major

N = 10.11.2.8

Link Down (vethernet1059)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 486540323) Down, should be Up (ifEntry.486540323)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 486548517) Down, should be Up (ifEntry.486548517)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486548516) Down, should be Up (ifEntry.486548516)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 503317541) Administratively Down (ifEntry.503317541)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 503317540) Administratively Down (ifEntry.503317540)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486540322) Down, should be Up (ifEntry.486540322)

S = Major

N = 10.11.2.9

Link Down (vethernet9253)

S = Major

N = 10.11.2.8

Link Down (vethernet9254)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)

S = Major

N = 10.11.2.9

Link Down (Ethernet5/1/2)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)

S = Major

N = 10.11.2.8

Link Down (Ethernet5/1/2)


Service Tree Event Overlay Location and Content

SIA events are overlain on the Service Tree in the Service Availability view.

Table A-13 Service Tree events Observed for UC3 - UCS Blade Failure - CUCM 

Location
Summary

...

-> CUCM-71-pub

The virtual machine CUCM-71- pub running on host 10.11.3.152 is Disconnected. Message:

KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"ON VM:cisco-10.11.3.152:ESX ON 31551544 (Event_Type=VmDisconnectedEvent)]

CUCxn-CL-C071-1 -> Voice Service

Meta event for Voice Service - C071

... -> CUCM-71-pub

PerformancePollingStopped::Component= cucm-71-pub.customer.com; Error

Message String= 06-Jul-2012 14:47:59 EDT,

cucm-71- pub.customer.com,192.6.4.123,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...

-> Cluster_Availability-->

Internode_Trunks

SDL Link Out Of Service::Component= 192.6.4.124-192.6.4.123; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.123; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService >;


Next Steps:


Step 1 The VM on the host is automatically brought up in another host via HA.

Step 2 The original host is brought back via following steps:

a. Troubleshoot and resolve the blade issue.

b. ESXi Host (10.11.3.152) > Reconfigure for VMware HA.

c. Drag and drop CUCM-71-pub from the host that it moved to back to ESXi Host (10.11.3.152).

d. Clear any alarms on the CUCM VM.


UC4 - UCS Blade Failure - CUCxn

This use case describes the events that Prime Central for HCS receives if a UCS blade fails. This type of incident generates both Root Cause (RC) and Service Impact (SI) events. The CUCxn VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the UCS blade is replaced.

Observed RC-EL Events

When the UCS blade fails, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_Blade_Avlblty, UCS_BladeLinks, and OM_CUCxn_OM_Connectivity. Eventually, there is only one primary synthetic RCA event of UCS_Blade_Avlblty.

Table A-14 Observed Root Cause Events for UC4 

Severity
EventTypeID
Summary

Critical

UCS_Blade_Avlblty

Synthetic Event for UCS_Blade_Avlblty group events from 10.11.2.10


Observed SI-EL Events

CUCxn voice mail service is impacted.

Table A-15 Observed Service Events for UC4

Severity
Summary

Critical

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal. (Flapping Threshold Exceeded: 5 >= 5, over the last 300 s(2012-07-06 13:56:16.000- >2012-07-06 14:01:16.000))


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.


Note 10.11.2.8, 10.11.2.9, and 10.11.2.10 are UCS6140 side A, UCS6140 side B, and UCSM IP.


Table A-16 Observed Other Events for UC4 

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Warning

C = C071

N = CUCM-71-pub

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCxn-71-pub was restarted on 10.11.3.141 since 10.11.3.152 failed. Message:

KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:cisco-10.11.3.141:ESX ON 31548751

(Event_Type=VmRestartedOnAlternateHostEvent)]

S = Indeterminate

N = 10.11.2.10

EN = fltAdaptorUnitAdaptorReachability

ET = default

Adapter 5/2/1 is unreachable

(FaultCode:fltAdaptorUnitAdaptorReachability,FaultIndex)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 503317541) Administratively Down (ifEntry.503317541)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 486548517) Down, should be Up (ifEntry.486548517)

S = Major

N = 10.11.2.8

Link Down (vethernet9254)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 486540323) Down, should be Up (ifEntry.486540323)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486540322) Down, should be Up (ifEntry.486540322)

S = Major

N = 10.11.2.9

Link Down (vethernet1059)

S = Major

N = 10.11.2.8

Link Down (vethernet1060)

S = Major

N = 10.11.2.9

Link Down (vethernet9253)

S = Major

N = 10.11.2.8

Network Interface (ifIndex =

520355904) Down, should be Up

(ifEntry.520355904)

S = Major

N = 10.11.2.8

Link Down (Ethernet5/1/2)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 503317540) Administratively Down (ifEntry.503317540)

S = Major

N = 10.11.2.9

Network Interface (ifIndex = 486548516) Down, should be Up (ifEntry.486548516)

S = Major

N = 10.11.2.8

Network Interface (ifIndex = 520355904) Down, should be Up (ifEntry.520355904)

S = Major

N = 10.11.2.8

Link Down (Ethernet5/1/2)


Service Tree Event Overlay Location and Content

SIA events are overlain on the Service Tree in the Service Availability view.

Table A-17 Observed Service Tree events for UC4

Location
Summary

...

-> CUCxn-71-pub

PerformancePollingStopped::Component= cucxn-71-pub.customer.com; Error Message String= 06- Jul-2012 13:59:59 EDT,cucxn-71- pub.customer.com,192.6.4.125, Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> CUCxn-71-pub

Unresponsive::Component= cucxn-71-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 4096 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-22-2012 11:32:41; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.125; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 07-05-2012 18:07:05; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive>;


Next Steps


Step 1 The VM on the host is automatically brought up in another host via HA.

Step 2 The original host is brought back via following steps:

a. Troubleshoot and resolve the blade issue.

b. ESXi Host (10.11.3.152) > Reconfigure for VMware HA.

c. Drag and drop CUCxn the host that it moved to back to ESXi Host (10.11.3.152).

d. Clear any alarms on the CUCxn VM.


UC5 - Application Cold Failure - CUCM

This use case describes the events that Prime Central for HCS receives if a CUCM server restarts. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.

Observed RC-EL Events

When the CUCM server restarts, numerous synthetic RCA events are observed, including OM_CUCM_Processes, OM_CUCM_TFTP_Processes, OM_CUCM_Endpt_Connectivity, and OM_CUCM_OM_Connectivity.

Eventually, there is only one primary synthetic RCA event of OM_CUCM_NodeRestart.

Table A-18 Observed Root Cause Events for UC5

Severity
EventTypeID
Summary

Warning

OM_CUCM_NodeRestart

DeviceRestarted::Component= cucm-71- pub.customer.com; Default Event Name= DeviceRestarted; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DeviceRestarted >;


Observed SI-EL Events

CUCM voice service impacts voice mail and presence.

Table A-19 Observed Service Events for UC5

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlain on the Service Tree in the Service Availability view.

Table A-20 Service Tree events Observed for UC5 - Application Cold Failure - CUCM 

Location
Summary

...-> Voice Service

Meta event for Voice Service - C071

...

-> CUCM-71-pub

PerformancePollingStopped::Component= cucm-71-pub.customer.com; Error Message String= 16- Jul-2012 16:19:54 EDT,cucm-71- pub.customer.com,192.6.4.123,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

... -> Cluster_Availability--> Internode_Trunks

SDL Link Out Of Service::Component= 192.6.4.124-192.6.4.123; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.123; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService >;


Next Steps

A system restart automatically recovers.

The event with the EventTypeId of OM_CUCM_NodeRestart will automatically clear in 60 minutes.

UC6 - Application Cold Failure - CUCxn

This use case describes the events that Prime Central for HCS receives if a CUCxn server restarts. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.

Observed RC-EL Events

When the CUCxn server restarts, a synthetic RCA event of OM_CUCxn_OM_Connectivity is observed.

Table A-21 Observed Root Cause Events for UC6

Severity
EventTypeID
Summary

Critical

OM_CUCxn_OM_Connectivity

Synthetic Event for OM_CUCxn_OM_Connectivity group events from cucxn-71-pub.customer.com


Observed SI-EL Events

CUCxn voice mail service is impacted.

Table A-22 Observed Service Events for UC6

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.

Table A-23 Observed Other Events for UC6

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Warning

C = C071

N = cucxn-71-pub.customer.com

EN = AutoFailbackSucceeded

ET = default

AutoFailbackSucceeded::Component= 192.6.4.125-null; Detail= %1 : PEER_REBOOT

S = Warning

C = C071

N = cucxn-71-pub.customer.com

EN = DeviceRestarted

ET = default

CUST_C071_CLS_CUCXN_CUCxn-CLC071-

1

DeviceRestarted::Component= cucxn-71- pub.customer.com; Default Event Name= DeviceRestarted; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DeviceRestarted >;


Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view.

Table A-24 Observed Service Tree events for UC7

Location
Summary

... -> CUCM-71-pub

PerformancePollingStopped::Component= cucxn-71-pub.customer.com; Error Message String= 17- Jul-2012 08:51:50 EDT,cucxn-71- pub.customer.com,192.6.4.125,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;


Next Steps

A system restart automatically recovers.

The event with the EventTypeId of OM_CUCxn_OM_Connectivity will automatically when the issue clears on the server.

Should there be a different type of OS failure, other recovery steps would be required.

UC7 - Changing the Number of Registered Gateways and Media Devices

This use case describes the events that the Prime Central for HCS dashboard displays if the number of registered gateways and media devices changes in the CUCM cluster. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.

Observed RC-EL Events

Decreasing the number of registered gateways or media devices generates synthetic RCA events for OM_CUCM_Registration and OM_CUCM_Endpt_Connectivity. When the media device registers the event, OM_CUCM_Endpt_Connectivity is cleared. The raw events for OM_CUCM_Registration are Number Of Registered MediaDevices Decreased and Number Of Registered MediaDevices Increased.

Table A-25 Observed Root Cause Events for UC7

Severity
EventTypeID
Summary

Warning

OM_CUCM_Registration

Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C070-1

Critical

OM_CUCM_Endpt_Connectivity

Synthetic Event for OM_CUCM_Endpt_Connectivity group events from CUCM-CL-C070-1


Observed SI-EL Events

Changing in number of registered gateways and media devices may not impact voice mail and presence services, but Prime Central for HCS by default shows impact on voice mail and presence Services if voice service is impaired.

Table A-26 Observed Service Events for UC7

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they point to potential root causes for impacted services. Currently, cluster-level events do not participate in RCA correlation. The raw event mapped to the cluster level EventTypeId OM_CUCM_Endpt_Connectivity is marked as unknown and does not participate in any RCA and SIA. Therefore, OM_CUCM_Endpt_Connectivity raw event shows up in the Other-EL field. The raw event mapped to the cluster level EventTypeId OM_CUCM_Registration is marked as unknown and does not participate in RCA. But OM_CUCM_Registration does participate in SIA. Therefore, OM_CUCM_Registration raw event does not show up in the Other-EL field.

Table A-27 Observed Other Events for UC7

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Critical

N = CUCM-CL-C070-1

EN = EndPointLostContact

ET = OM_CUCM_Endpt_Connectivity

EndPointLostContact::Component= CUCM-CL-C070-1-CFB_2; EndPoint Name= CFB_2; EndPoint IPAddress= 200.1.1.11; EndPoint Status= UnRegistered; EndPoint Type= Conference Bridge; Device Pool= Default; CUCM Node= 192.6.4.116; Timestamp= 2012-07-18 13:36:50.326; Default Event Name= EndPointLostContact; DescriptionURL= <


Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view.

Table A-28 Observed Service Tree Events for UC7 

Location
Summary

...->Call Control--> Registration

Number Of Registered MediaDevices Decreased::Component= VECUCM- CL-C070-1-RTMTSyslog- [Id#1342633009817]; Detail= Number of registered Media Devices decreased between consecutive polls. Current monitored precanned object has decreased by 1 The alert is generated on Wed Jul 18 13:37:19 EDT 2012 on cluster CUCM-CL-C070-1.][App ID=Cisco AMC Service][Cluster ID=][Node ID=CUCM-70-pub]: RTMT Alert; Default Event Name= Number Of Registered MediaDevices Decreased; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=NumberOfRegisteredMediaDevicesDecreased] >;

...-> Voice Service

Meta event for Voice Service - C070


Next Steps


Step 1 Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click Raw event > Event Details > Next Steps to display the following recommendation:

Go to CUCM to verify the registration status of the reported end point. Verify that 
IPconnectivity exists between the cluster and the endpoint.

Step 3 To clear this event, go to the CUCM Administration page > Service Parameter screen and set the Run Flag to True for the conference bridge.


UC8 - TFTP Server for UC Services - Critical Processes Failure

This use case describes the events that the Prime Central for HCS dashboard displays if critical processes such as TFTP service fail. In the UC environment, TFTP service is essential for new UC endpoints, which use TFTP to download code and register with CUCM servers.

Observed RC-EL Events

The TFTP process running on the CUCM Publisher system is forced to stop running.

Table A-29 Observed Root Cause Events for UC8

Severity
Summary

Critical

Synthetic Event for OM_CUCM_TFTP_Processes group events from cucm-81-pub.customer.com


Observed SI-EL Events

TFTP Service failure will not impact overall Voice service including voice mail and presence services. It affect only new endpoint which get stranded because unable to download the image for it to work.

Table A-30 Observed Service Events for UC8

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C081-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C081-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C081-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view.

Table A-31 Observed Service Tree Events for UC8

Location
Summary

...->TFTP_App

ServiceDown::Component= VScucm- 81-pub.customer.com/ Cisco Tftp; ProductName=

Cisco Tftp; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=ServiceDown >;

...-> Voice Service

Meta event for Voice Service - C081


Next Steps


Step 1 Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click Raw event > Event Details > Next Steps to display the following recommendation:

Identify which services are not running. You can start the service manually from the 
Administrator Service Control page. To disable monitoring for a specific service, go to 
the Detailed Device View of the device, select the specific service, and change the 
managed state to False.

Step 3 Check whether there are any core and service trace files. If they are available, then download them.


UC9 - Detecting and Correlating Customer Voice Quality Degradation

This use case describes what events dash board of Prime Central for HCS will show if voice quality degradation is detected using aggregated quality event generation per cluster. This type of incident generates only Service Impact (SI) events.

Observed RC-EL Events

None.

Observed SI-EL Events

Table A-32 Observed Service Events for UC9 

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C072-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C072-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-33 Observed Service Tree Events for UC9

Location
Summary

...-> Voice Service

Meta event for Voice Service - C072

...-> VoiceQuality

ServiceQualityThresholdCrossed::Component= Device Pool:devicepool3449; Source= Cisco Unified Operations Manager; Impacted Endpoints at the time event was raised= 1; Threshold Percentage at the time event was raised= 10.0; Registered Phone Count at the time event was raised= 1; Default Event Name= ServiceQualityThresholdCrossed; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=ServiceQualityThresholdCrossed >;


Next Steps


Step 1 Go to CUOM/SM and generate a call quality report.

Step 2 Check network for possible delay/jitter issues.


UC11 - VMware VM Failure - CUCM

This use case describes the events that the Prime Central for HCS dashboard displays if a VM fails abruptly. This type of incident generates both Root Cause (RC) and Service Impact (SI) events.

Observed RC-EL Events

When the VM shuts down, numerous synthetic RCA events are observed, including VC_VM_Avlblty, OM_CUCM_NodeRestart, OM_CUCM_Redundancy, and OM_CUCM_Endpt_Connectivity. The CUCM-C081-pub node generates an OM_CUCM_Redundancy event. This event should be treated as the root cause event for the CUCM publisher node because correlation between publisher and subscriber nodes (sibling correlation) is not currently supported.

Table A-34 Observed Root Cause Events for UC11

Severity
EventTypeID
Summary

Critical

VC_VM_Avlblty

Synthetic Event for VC_VM_Avlblty groupevents from CUCM-81-sub2

Critical

OM_CUCM_Endpt_Connectivity

Synthetic Event for OM_CUCM_Endpt_Connectivity group events from CUCM-CL-C081-1

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-tftp.customer.com

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-pub.customer.com

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-sub1.customer.com


Observed SI-EL Events

Table A-35 Observed Service Events for UC11

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C081-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C081-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C081-1 is Marginal.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.

Table A-36 Observed Other Events for UC11

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Critical

C = C081

N = CUCM-CLC081-

1

EN = DBReplicationFailure

ET = OM_CUCM_BackupRestore

DBReplicationFailure::Component= VECUCM- CL-C081-1; CallManagerList= 192.6.4.195,192.6.4.202,192.6.4.197,192.6.4.196; ReplicationStatus= Replication is bad in the cluster; CustomerName= C081; Default Event Name= DBReplicationFailure; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DBReplicationFailure] >;

S = Critical

C = C081

N = CUCM-CLC081-

1

EN = EndPointLostContact

ET = OM_CUCM_Endpt_Connectivity

EndPointLostContact::Component= CUCM-CL-C081-1-MTP_6; EndPoint Name= MTP_6; EndPoint IPAddress= 200.1.1.17; EndPoint Status= UnRegistered; EndPoint Type= Media Termination Point; Device Pool= Default; CUCM Node= 192.6.4.195; Timestamp= 2012-06-25 17:38:06.634; Default Event Name= EndPointLostContact; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=EndPointLostContact] >;

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-tftp.customer.com

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-pub.customer.com

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-81-sub1.customer.com


Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view. Table 19-36 shows service tree events observed during testing.

Table A-37 Observed Service Tree Events for UC11 

Location
Summary

...-> Cluster_Availability -->

Internode_Trunks

SDL Link Out Of Service::Component= 192.6.4.195-192.6.4.197; Local Application ID= CCM; Remote Node ID= 5; Unique Link ID= 1:100:5:100; Remote Application IP Address= 192.6.4.197; Local Node ID= 1; Remote Application ID= CCM; Default Event Name= SDL Link Out Of Service; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService] >;

...-> CUCM-81-sub2 --> VM Availability

The virtual machine CUCM-81-sub2 running on 10.11.3.147 is offline. Message: KVM_VM_Powered_Off_Cisco_HCM[(Event_Type=N"ON VM:cisco-10.11.3.147:ESX ON 30374781 (Event_Type=VmPoweredOffEvent)]

...-> CUCM-81-pub --> VM Resources

Alarm ''Virtual Machine Disk Latency High'' on CUCM-81-sub2 changed from Green to Gray. Message: KVM_VM_Disk_Latency[(Event_Type=N"AlarmStatusChangedEvent" AND Event_TextLIKEN"Virtual*Machine*Disk*Latency" ON VM:cisco-10.11.3.147:ESX ON 30374777 (Event_Type=AlarmStatusChangedEvent Event_Text=Alarm ''Virtual Machine Disk Latency High'' on CUCM-81-sub2 changed from Green to Gray)]

...-> Cluster_Availability --> Sub: CUCM-81-sub2

Unresponsive::Component= cucm-81-sub2.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-22-2012 16:43:59; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.197; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-24-2012 18:06:21; Default Event Name= Unresponsive; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/

com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive] >;

...-> Cluster_Availability --> Sub: CUCM-81-sub2

PerformancePollingStopped::Component= cucm-81-sub2.customer.com; Error Message String= 25- Jun-2012 17:39:58 EDT,cucm-81- sub2.customer.com,192.6.4.197,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped] >;

...-> Voice Service

Meta event for Voice Service - C081


Next Steps


Step 1 Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details > Next Steps to display the following recommendation:

Go to CUCM to verify the registration status of the reported endpoint. 
 
   

Step 3 Verify whether IP connectivity exists between the cluster and endpoints.


UC12 - CUCM Clustering Problems

This use case describes the events that the Prime Central for HCS dashboard displays for CUCM clustering issues, such as a server running a different version of software and database replication issues in the cluster. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.

Such problems impair CUCM cluster performance as call processing nodes, so immediate attention is needed to fix these issues.

Observed RC-EL Events

When CUCM Publisher is brought up with old version of software when the Subscriber nodes run a newer software version, many synthetic RCA events will be noticed, including OM_CUCM_Processes, OM_CUCM_NodeRestart, and OM_CUCM_Redundancy as follows.

Table A-38 Observed Root Cause Events for UC12

Severity
EventTypeId
Summary

Critical

OM_CUCM_Processes

Synthetic Event for OM_CUCM_Processes group events from cucm-70-sub.customer.com.

Warning

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from CUCM-CL-C070-1

Warning

OM_CUCM_Registration

Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C070-1

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-70-pub.customer.com

Warning

OM_CUCM_NodeRestart

Synthetic Event for OM_CUCM_NodeRestart group events from cucm-70-pub.customer.com


Observed SI-EL Events

CUCM voice service impacted voice mail and presence. Table 19-38 shows Service Events observed during testing

Table A-39 Observed Service Events for UC12

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.

Table A-40 Observed Other Events for UC12

Severity (s)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Critical

C = C070

N = CUCM-CLC070-

1

EN = DBReplicationFailure

ET = OM_CUCM_BackupRestore

DBReplicationFailure::Component= VECUCM- CL-C070-1; CallManagerList= 192.6.4.116,192.6.4.117; ReplicationStatus= Replication is bad in the cluster; CustomerName= C070; Default Event Name= DBReplicationFailure; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=DBReplicationFailure] >;

S = Warning

C = C070

N = CUCM-CLC070-

1

EN = SystemVersionMismatched

ET = OM_CUCM_Redundancy

SystemVersionMismatched::Component= VE-CUCM-CL-C070-1; NodeVersionInformation= cucm-70- pub.customer.com(8.6.2.20000-2),cucm-70- sub.customer.com(8.6.2.21900-5); CustomerName= C070; Default Event Name= SystemVersionMismatched; DescriptionURL= < [http://150.0.0.52:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine? tag=SystemVersionMismatched] >;


Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view.

Table A-41 Observed Service Tree events for UC12 

Location
Summary

...-> Cluster_Availability -->

Internode_Trunks

SDL Link Out Of Service::Component= 192.6.4.117-192.6.4.116; Local

Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.116; Local Node ID= 2; Remote Application ID= CCM; Default Event Name= SDL Link

Out Of Service; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SDLLinkOutOfService >;

...-> Cluster_Availability -->

Internode_Trunks

SystemVersionMismatched::Component= VE-CUCM-CL-C070-1; NodeVersionInformation= cucm-70- pub.customer.com(8.6.2.20000-2),cucm-70- sub.customer.com(8.6.2.21900-5); CustomerName= C070; Default Event Name= SystemVersionMismatched; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=SystemVersionMismatched >;

...-> Voice Service

Meta event for Voice Service - C070


...

Next Steps


Step 1 Right-click the Synthetic RC event> Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details > Next Steps to display the following recommendation:

Investigate why the remote Communications Manager is not running or whether a network 
problem exists.
 
   

Step 3 Correct the version issue by switching back to the original version.


UC13 - Change in Number of Registered Phones

This use case describes the events that the Prime Central for HCS dashboard displays if the number of registered phones in the cluster drops more than a configured percentage between consecutive polls. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.

Observed RC-EL Events

When the number of registered phones decreases, only one synthetic RCA event, OM_CUCM_Registration, is observed.

Table A-42 Observed Root Cause Events for UC13

Location
EventTypeID
Summary

Warning

OM_CUCM_Registration

Synthetic Event for OM_CUCM_Registration group events from CUCM-CL-C072-1.


Observed SI-EL Events

CUCM voice service impacts presence and voice mail. Table 19-41 shows the Service Events observed `during testing.

Table A-43 Observed Service Events for UC13

Location
Summary

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C072-1 is Marginal.

Minor

CUST_C072_CLS_CUCXN_CUCxn- CL-C072-1 Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.

Minor

CUST_C072_CLS_CUCM_CUCM-CLC072- Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C072-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-44 Observed Service Tree Events for UC13

Location
Summary

...-> Call Control --> Registration

Number Of Registered Phones Dropped::Component= VECUCM- CL-C072-1-RTMTSyslog- [Id#1342640053391]; Detail= Number of registered phones in the cluster drop more than configured percentage between consecutive polls. Configured high threshold is 30%.

...-> Call Control --> Registration

PhoneUnregThresholdExceeded::Component= Device Pool:devicepool3449; Unreg Count= 1;Total Count= 1;Threshold In %= 10.0;ClusterName= CUCM-CL-C072-1;Device Pool= devicepool3449;Default Event Name= PhoneUnregThresholdExceeded; DescriptionURL= <

...-> Voice Service

Meta event for Voice Service - C072


Next Steps


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details > Next Steps to display the following recommendation:

Phone registration status must be monitored for sudden changes. If the registration status 
changes slightly and readjusts quickly over a short time frame, it could indicate a phone 
move, addition, or change. A sudden smaller drop in the phone registration counter could 
indicate a localized outage; for instance, an access switch or a WAN circuit outage or 
malfunction.A significant drop in registered phone level requires immediate attention from 
the administrator. 

Step 3 Register the phones to clear the event.


UC15 - CUCxn Critical Process Failure

This use case describes the events that the Prime Central for HCS dashboard displays if a critical process fails in CUCxn. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.

Observed RC-EL Events

If a critical process is killed in voice mail service (CUCxn), only one synthetic RCA event, OM_CUCxn_processes, is generated.

Table A-45 Observed Root Cause Events for UC15

Severity
EventTypeID
Summary

Critical

OM_CUCxn_Processes

Synthetic Event for OM_CUCxn_Processes group events from cucxn-72-pub.customer.com


Observed SI-EL Events

Critical process failures impact voice mail service.

Table A-46 Observed Service Events for UC15

Severity
EventTypeID
Summary

Minor

CUST_C072_CLS_CUCXN_CUCxn- CL-C072-1

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-47 Observed Service Tree Events for UC15

Severity
Summary

Cluster Availability

->Pub-CUCxn-72-pub

ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection Conversation Manager; ProductName= Connection Conversation Manager; CurrentState= Stopped; Default Event Name= ServiceDown;


Next Steps


Step 1 Right-click the Synthetic RC event> Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details> Next Steps to display the following recommendation:

Identify which services are not running. You can start the service manually from the 
Administrator Service Control page. To disable monitoring for a specific service, go to 
the device's Detailed Device View, select the specific service, and change the managed 
state to False. Check to see if there are any core files. Download the core files, if any, 
as well as service trace files. Events are removed for Unified CM only. You may need to 
manually clear these Unified CM events after your upgrade is complete.
 
   

Step 3 Type the IP address in CUCxn and select Cisco Unity connection service availability > Tools > Service management > Connection Conversation Manager to start the service.


UC16 - VMware VM Failure - CUCxn

This use case describes the events that the Prime Central for HCS dashboard displays if a VM running CUCxn fails abruptly. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.

Observed RC-EL Events

When the VM shuts down, numerous synthetic RCA events are observed, including VC_VM_Avlblty, OM_CUCM_NodeRestart, and OM_CUCM_OM_Connectivity. Eventually, Prime Central for HCS stabilizes to one root cause, VC_VM_Avlblty.

Table A-48 Observed Root Cause Events for UC16

Severity
EventTypeID
Summary

Critical

VC_VM_Avlblty

Synthetic Event for VC_VM_Avlblty group events from CUCxn-72-pub.


Observed SI-EL Events

VM failure impacts voice mail service.

Table A-49 Observed Service Events for UC16

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.


Observed Other-EL

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view.

Table A-50 Observed Service Tree Events for UC16 

Location
Summary

...-> Cluster_Availability -->

PUB:CUCXn-72-pub

PerformancePollingStopped::Component= cucxn-72-pub.customer.com; Error Message String= 06- Jul-2012 11:08:43 EDT,cucxn-72- pub.customer.com,192.6.4.132,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped] >;

...-> Cluster_Availability -->

PUB:CUCXn-72-pub

Unresponsive::Component= cucxn-72-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 4096 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt=

06-22-2012 11:32:48; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.132; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 07-05-2012 18:07:17; Default Event Name=

Unresponsive; DescriptionURL= < [http://150.0.0.52:1741/ CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive] >;

...-> CUCxn-72-pub--> VM Availability

The virtual machine CUCxn-72-pub running on 10.11.3.148 is offline. Message: KVM_VM_Powered_Off_Cisco_HCM[(Event_Type=N"ON VM:cisco-10.11.3.148:ESX ON 31539282 (Event_Type=VmPoweredOffEvent)]


Next Steps


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details > Next Steps to display the following recommendation:

Check if the device is reachable from Unified Operations Manager.

UC17 - CUCxn Clustering Problems

This use case describes the events that the Prime Central for HCS dashboard displays for CUCxn clustering issues, such as a server running a different version of software and database replication issues in the cluster. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.

Observed RC-EL Events

When there is a mismatch in CUCxn publisher and subscriber software versions, three Synthetic RCA events, OM_CUCxn_Redundancy, OM_CUCxn_Processes, and OM_CUCxn_Processes are observed.

Table A-51 Observed Root Cause Events for UC17

Severity
EventTypeID
Summary

Critical

OM_CUCxn_Redundancy

Synthetic Event for OM_CUCxn_Redundancy group events from cucxn-72-pub.customer.com

Critical

OM_CUCxn_Processes

Synthetic Event for OM_CUCxn_Processes group events from cucxn-72-sub.customer.com

Critical

OM_CUCxn_Processes

ServiceDown::Component= VS-cucxn-72- sub.customer.com/Connection Inbox RSS Feed; ProductName= Connection Inbox RSS Feed; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= <


Observed SI-EL Events

Version mismatch impacts Voice mail service.

Table A-52 Observed Service Events for UC17

Severity
Summary

Minor

CUST_C072_CLS_CUCXN_CUCxn- CL-C072-1 Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-53 Observed Service Tree Events for UC17

Severity
Summary

...->Cluster_Availability --

>Internode_Trunks

NoConnectionToPeer::Component= 192.6.4.132-RTMTSyslog; Detail=

%1 : cucxn-72-pub.customer.com AppID : CuSrm ClusterID : NodeID : ;

cucxn-72-pub.customer.com

ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection Voice Mail Web Service; ProductName= Connection;

cucxn-72-pub.customer.com

ServiceDown::Component= VScucxn- 72-sub.customer.com/ Connection Inbox RSS Feed; ProductName= Connection Inbox ;

cucxn-72-pub.customer.com

ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection Serviceability; ProductName= Connection ;

cucxn-72-pub.customer.com

ServiceDown::Component= VScucxn- 72-sub.customer.com/ Connection Administration; ProductName= Connection ;

cucxn-72-pub.customer.com

ServiceDown::Component= VScucxn- 72-sub.customer.com/ Connection Administration; ProductName= Connection ;

cucxn-72-pub.customer.com

ServiceDown::Component= VScucxn- 72-pub.customer.com/ Connection SNMP Agent; ProductName= Connection SNMP ;


Next Steps


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details > Next Steps to display the following recommendation:

Make sure that the secondary server is active and connected to primary.
 
   

Step 3 Correct the version issue by switching back to the original version.


UC18 - CUCM Critical Process Failure

This use case describes the events that the Prime Central for HCS dashboard displays if a critical process fails in CUCM. Prime Central for HCS generates Root Cause (RC) and Service Impact (SI) events for such incidents.

Observed RC-EL Events

When a critical process is killed, two Synthetic RCA events, OM_CUCM_Processes, and OM_CUCM_Redundancy are observed.

Table A-54 Observed Root Cause Events for UC18

Severity
EventTypeID
Summary

Critical

OM_CUCM_Processes

Synthetic Event for OM_CUCM_Processes group events from cucm-72-pub.customer.com

Critical

OM_CUCM_Redundancy

Synthetic Event for OM_CUCM_Redundancy group events from cucm-72-sub.customer.com


Observed SI-EL Events

CUCM voice service impacts presence and voice mail.

Table A-55 Observed Service Events for UC18

Severity
Summary

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C072-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C072-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-56 Observed Service Tree Events for UC18

Severity
Summary

...->Cluster_availability --

>PUB:CUCM-72-pub

ServiceDown::Component= CCM-cucm-72- pub.customer.com/1; CallManagerName= 192.6.4.130; CallManagerStatus= Stopped; Default Event Name= ServiceDown; DescriptionURL= < ;

...->Cluster_availability --

>Internode_Trunks

SDL Link Out Of Service::Component= 192.6.4.131-192.6.4.130; Local Application ID= CCM; Remote Node ID= 1; Unique Link ID= 2:100:1:100; Remote Application IP Address= 192.6.4.130; Local Node ID= 2; Remote;

Cust_C072

-> VoiceService

Meta event for Voice Service - C072.


Next Steps


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details > Next Steps to display the following recommendation:

Identify which services are not running. You can start the service manually from the 
Administrator Service Control page. To disable monitoring for a specific service, go to 
the device's Detailed Device View, select the specific service, and change the managed 
state to False. Check to see if there are any core files. Download the core files, if any, 
as well as service trace files. Events are removed for Unified CM only. You may need to 
manually clear these Unified CM events after your upgrade is complete.

Step 3 Use Ctrl-C to end the running process.

Step 4 Log into the CUCM application. Type the IP address of CUCM and select Cisco Unified service

Availability > Tools > Feature services to start the service.


UC19 - UCS Chassis Failure - CUCM

This use case describes the events that Prime Central for HCS receives if the chassis hosting CUCM nodes loses power. This type of incident generates both Root Cause (RC) and Service Impact (SI) events. The CUCM VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCM nodes stay down until the chassis is powered on. In the following example, the same chassis hosts all UC VMs for customer 80.

Observed RC-EL Events

When the chassis powers off, numerous synthetic RCA events are observed, including UCS_Chassis_Fault, UCS_Blade_Avlblty, VC_Host_Avlblty, and UCS_BladeLinks. Eventually, the UCS_Chassis_Fault synthetic RCA event remains as the root cause.

Table A-57 Observed Root Cause Events for UC19

Severity
EventTypeID
Summary

Critical

UCS_Chassis_Fault

Synthetic Event for UCS_Chassis_Fault group events from 10.13.2.1

Critical

OM_CUCM_OM_Connectivity

Synthetic Event for OM_CUCM_OM_Connectivity group events from CUCM-CL-C080-1

Critical

VC_Host_Avlblty

Synthetic Event for VC_Host_Avlblty group events from 10.13.3.31


Observed SI-EL Events

CUCM voice service impacts voice mail and presence. In this example, CUCxn and CUP VMs are hosted in the same chassis, and all voice, voice mail, and presence services are affected.

Table A-58 Service Events for UC19

Severity
Summary

Critical

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C080-1 is Bad.

Critical

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.

Critical

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C080-1 is Bad.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services. The following table presents a list of events observed during internal testing if HA is not enabled on the cluster.


Note 10.11.2.8, 10.11.2.9, and 10.11.2.10 are UCS6140 side A, UCS6140 side B, and UCSM IP. Georedundancytemp-sa is the cluster name in the vCenter containing the C080 customer VMs.


Table A-59 Observed Other Events for UC19 (No HA) 

Severity(S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)/
Summary

S = Major

N= 10.13.2.10

EN = fltAdaptorExtIfLinkDown

ET = UCS_Adapter

Adapter uplink interface 3/4/1/1 link state: unavailable(FaultCode:fltAdaptorExtIfLinkDown,FaultIndex

S = Indeterminate

N = 10.13.2.8

EN = fltAdaptorUnitAdaptorReachability

ET = default

Adapter 3/1/1 is unreachable (FaultCode:fltAdaptorUnitAdaptorReachability, FaultIndex:3695955)

S = Major

N = 10.13.2.8

EN = fltEtherSwitchIntFIoSatelliteConnection Absent

ET = UCS_PortsLinks

No link between IOM port 3/1/1 and fabric interconnect A:1/9 (FaultCode:fltEtherSwitchIntFIoSatellite ConnectionAbsent, FaultIndex:3468974)

S = Major

N = 10.13.2.8

EN = fltDcxVcMgmtVifDown

ET = UCS_Mgmt_Link

 IOM 3 / 1 (A) management VIF 3 down, reason None (FaultCode:fltDcxVcMgmtVifDown, FaultIndex:3468977)

S = Indeterminate

N = 10.13.2.10

EN = fltPortPIoLinkDown

ET = UCS_Etherne

Ether port 10 on fabric interconnect A oper state: link-down, reason: Link failure or notconnected (FaultCode:fltPortPIoLinkDown,FaultIndex).

S = Major

N = 10.13.2.10

EN = fltEquipmentIOCardUnsupportedConnectivity

ET = default

IOM 3/2 (B) current connectivity does not match discovery policy: unsupported connectivity(FaultCode:fltEquipmentIOCardUnsupportedConnectivity

S = Major

N = 10.13.2.8

EN = fltEquipmentIOCardUnsupported Connectivity

ET = default

IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode:fltEquipmentIOCardUnsupported Connectivity,FaultIndex: 3695902).

S = Major

N = 10.13.2.8/li>

EN = fltLsServerInaccessible

ET = default

Service profile c3b1 cannot be accessed (FaultCode:fltLsServerInaccessible, FaultIndex:3695961)

S = Warning

C = C080

EN = RTMTDataMissing

ET = OM_CUCM_OM_Connectivity

RTMTDataMissing::Component= VECUCM- CL-C080-1; CallManagerList= 192.6.4.186,192.6.4.193,192.6.4.188,192.6.4.187; ReasonForRTMTDataMissing= Unable to communicate with RTMT on publisher; CustomerName= C080; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing>;

S = Warning

C = C080

EN = RTMTDataMissing

ET = Default

RTMTDataMissing::Component= cucxn-80-pub.customer.com; Name= cucxn-80-pub.customer.com; HostDescription= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;


Service Tree Event Overlay Location and Content

Table A-60 Observed Service Tree Events for UC19 

Location
Summary

...-> Cluster_Availability --> Node:CUP-80-pub

PerformancePollingStopped::Component= cup-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cup-80- pub.customer.com,192.6.4.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;12 and 19.

...-> Cluster_Availability-->

Node:CUP-80-pub

Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.99.1.1.3.28; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44;

DiscoveredFirstAt= 06-28-2012 15:58:29; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:25; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCM-80-sub1

Unresponsive::Component= cucm-80-sub1.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.187; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:34; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability-->

Sub: CUCM-80-sub1

PerformancePollingStopped::Component= cucm-80-sub1.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT, cucm-80- sub1.customer.com,192.6.4.187,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Sub: CUCM-80-sub2

PerformancePollingStopped::Component= cucm-80-sub2.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- sub2.customer.com,192.6.4.188,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Sub: CUCM-80-sub2

Unresponsive::Component= cucm-80-sub2.customer.com;

SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.188; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability-->

Pub: CUCM-80-pub

PerformancePollingStopped::Component= cucm-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- pub.customer.com,192.6.4.186,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Pub: CUCM-80-pub

Unresponsive::Component= cucm-80-pub.customer.com;

SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.186; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/

...-> Cluster_Availability-->

Sub: CUCxn-80-sub

Unresponsive::Component= cucxn-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2;

DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.190;

IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:45; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability-->

Sub: CUCxn-80-sub

PerformancePollingStopped::Component= cucxn-80-sub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- sub.customer.com,192.6.4.190,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Pub: CUCxn-80-pub

PerformancePollingStopped::Component= cucxn-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- pub.customer.com,192.6.4.189,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Pub: CUCxn-80-pub

Unresponsive::Component= cucxn-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-28-2012 15:58:37; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.189; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:38; Default Event Name= Unresponsive; DescriptionURL=< http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> VM Availability

The virtual machine CUCxn-80-sub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301156 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-sub1 running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.34:ESX ON 2301145 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-sub2 running on host 10.13.3.33 is Disconnected. Message KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.33:ESX ON 2301136 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUP-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3-vcent-10.13.3.31:ESXON 2301155 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301158 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCxn-80-pub running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.32:ESX ON 2301185 (Event_Type=VmDisconnectedEvent)]

...-> VoiceService

Meta event for Voice Service - C080


Next Steps


Step 1 Cross-launch to the domain manager UCSM to confirm that the chassis is powered off. Power on the chassis to clear the events.


UC20 - UCS Chassis Failure - CUCxn

This use case describes the events that Prime Central for HCS receives if the chassis hosting CUCxn nodes lost power. Prime Central for HCS performs SIA and RCA for this use case. The CUCxn VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCxn nodes stay down until the chassis is powered on.

For the following example, the same chassis hosts all UC VMs for customer 80.

Observed RC-EL Events

When the chassis powers off, numerous synthetic RCA events are observed, including UCS_Chassis_Fault, UCS_Blade_Avlblty, VC_Host_Avlblty, and UCS_BladeLinks. Eventually, only one synthetic RCA event, UCS_Chassis_Fault, remains as the root cause.

Table A-61 Observed Root Cause Events for UC20

Severity
EventTypeID
Summary

Critical

UCS_Chassis_Fault

Synthetic Event for UCS_Chassis_Fault group events from 10.13.2.10

Critical

OM_CUCM_OM_Connectivity

Synthetic Event for OM_CUCM_OM_Connectivity group events from CUCM-CL-C080-1

Critical

VC_Host_Avlblty

Synthetic Event for VC_Host_Avlblty group events from 10.13.3.31


Observed SI-EL Events

CUCM voice service impacts voice mail and presence. In this example, CUCM VMs are hosted in the same chassis and all voice, voice mail, and presence services are affected because of this dependency. If only a CUCxn cluster is on the failed chassis, only CUCxn service is affected. The following table presents the list of events observed during internal testing.

Table A-62 Observed Service Events for UC20

Severity
Summary

Critical

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C080-1 is Bad.

Critical

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.

Critical

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C080-1 is Bad.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services. The following table presents the list of events observed during internal testing.


Note 10.11.2.8, 10.11.2.9, and 10.11.2.10 are UCS6140 side A, UCS6140 side B, and UCSM IP. georedundancytemp-sa is the cluster name in the vcenter containing Customer C080 VMs.


Table A-63 Observed Other Events for UC20 (No HA)

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Major

N = 10.13.2.8

EN = fltAdaptorExtIfLinkDown

ET = UCS_Adapter

Adapter uplink interface 3/1/1/1 link state: unavailable(FaultCode:fltAdaptorExtIfLinkDown,FaultIndex:

S = Indeterminate

N = 10.13.2.8

EN = fltAdaptorUnitAdaptorReachability

ET = default

Adapter 3/1/1 is unreachable(FaultCode:fltAdaptorUnitAdaptorReachability

S = Major

N = 10.13.2.10

EN = fltEtherSwitchIntFIoSatelliteConnectionAbsent

ET = UCS_PortsLinks

No link between IOM port 3/1/1 and fabric interconnect A:1/9(FaultCode:fltEtherSwitchIntFIoSatelliteConnectionAbsent

S = Major

N = 10.13.2.10

EN = fltDcxVcMgmtVifDown

ET = UCS_Mgmt_Link

IOM 3 / 1 (A) management VIF 3 down, reason None(FaultCode:fltDcxVcMgmtVifDown,FaultIndex

S = Major

N = 10.13.2.10

EN = fltPortPIoLinkDown

ET = UCS_Etherne

Ether port 10 on fabric interconnect A oper state: link-down, reason: Link failure or notconnected (FaultCode:fltPortPIoLinkDown,FaultIndex

S = Major

N = 10.13.2.8

EN = fltEquipmentIOCardUnsupported Connectivity

ET = default

IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode:fltEquipmentIOCardUnsupported Connectivity, FaultIndex:3695902)

S = Major

N = 10.13.2.8

EN = fltEquipmentIOCardUnsupportedConnectivity

ET = default

IOM 3/1 (A) current connectivity does not match discovery policy: unsupported connectivity(FaultCode:fltEquipmentIOCardUnsupportedConnectivity,FaultIndex:3695902)

S = Major

N = 10.13.2.8

EN = fltLsServerInaccessible

ET = default

Service profile c3b1 cannot be accessed (FaultCode:fltLsServerInaccessible, FaultIndex:3695961)

S = Warning

C = C080

EN = RTMTDataMissing

ET = OM_CUCM_OM_Connectivity

RTMTDataMissing::Component= VECUCM- CL-C080-1; CallManagerList= 192.6.4.186,192.6.4.193,192.6.4.188,192.6.4.187; ReasonForRTMTDataMissing= Unable to communicate with RTMT on publisher; CustomerName= C080; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;

S = Warning

C = C080

EN = RTMTDataMissing

ET = Default

RTMTDataMissing::Component= cucxn-80-pub.customer.com; Name= cucxn-80-pub.customer.com; HostDescription= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name=RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;


The following table presents the list of events observed during internal testing, if HA is enabled on the cluster.

Table A-64 Observed Other Events for UC20 (HA) 

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Warning

N = geo-redundancytemp- sa

EN = KVM_Cluster_Effective_CPU_Low

ET = VC_Cluster_Resources

The effective CPU amount of the cluster is low on georedundancy- temp-sa. Message: KVM_Cluster_Effective_CPU_Low[(Percent_Effective_AND Percent_Effective_CPU<50) ON tb3-vcenter:hcm-es-itm-m2:VM ON tb3 (Percent_Effective_CPU=29)]

S = Warning

N = geo-redundancytemp- sa

EN = KVM_Cluster_Effective_Mem_Low

ET = VC_Cluster_Resources

The effective memory of the cluster amount is low on georedundancy- temp-sa. Message: KVM_Cluster_Effective_Mem_Low[(Percent_Effective AND Percent_Effective_Memory<50) ON tb3-vcenter:hcm-es-itm-m2:VM ON tb3 (Percent_Effective_Memory=30)]

S = Warning

C = C080

N = CUP-80-pub

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUP-80-pub was restarted on 10.13.3.12 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.12:ESX ON 2312823 (Event_Type=VmRestartedOnAlternateHostEvent)]

S = Warning

C = C080

N = CUCM-80-pub

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCM-80- pub was restarted on 10.13.3.15 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.15:ESX ON 2312730 (Event_Type=VmRestartedOnAlternateHostEvent

S = Warning

C = C080

N = CUCM-80-sub2

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCM-80- sub2 was restarted on 10.13.3.12 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.12:ESX ON 2312744 (Event_Type=VmRestartedOnAlternateHostEvent)]

S = Warning

C = C080

N = CUCM-80-sub1

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCM-80- sub1 was restarted on 10.13.3.12 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.12:ESX ON 2312743 (Event_Type=VmRestartedOnAlternateHostEvent)]

S = Warning

C = C080

N = CUCxn-80-pub

EN = KVM_VM_RestartOnAlt_Host_Cisco

ET = VC_VM_Restored

Virtual machine CUCxn-80- pub was restarted on 10.13.3.15 since 10.13.3.31 failed. Message: KVM_VM_RestartOnAlt_Host_Cisco[(Event_Type=N"VmRestartedOnAlternateHostEvent" ON VM:tb3- vcent-10.13.3.15:ESX ON 2312727 (Event_Type=VmRestartedOnAlternateHostEvent)].


Service Tree Event Overlay Location and Content

The following table presents the list of events observed during internal testing.

Table A-65 Observed Service Tree Events for UC20 

Severity (S)/Customer (C)/Node
(N)
Summary

...-> Cluster_Availability -->

Node:CUP-80-pub

cup-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cup-80- pub.customer.com,192.6.4.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;12 and 19, i meant CUP-80-pub -> Cluster_Availability--> Node:CUP-80-pub CUST_C080_CLS_CUP_CUP-80- pub_CUP-80-pub

...-> Cluster_Availability-->

Node:CUP-80-pub

Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.99.1.1.3.28; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 06-28-2012 15:58:29; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:25; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCM-80-sub1

Unresponsive::Component= cucm-80-sub1.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.187; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:34; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCM-80-sub1

PerformancePollingStopped::Component= cucm-80-sub1.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT, cucm-80- sub1.customer.com,192.6.4.187,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Sub: CUCM-80-sub2

PerformancePollingStopped::Component= cucm-80-sub2.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- sub2.customer.com,192.6.4.188,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Sub: CUCM-80-sub2

Unresponsive::Component= cucm-80-sub2.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.188; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Pub: CUCM-80-pub

PerformancePollingStopped::Component= cucm-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucm-80- pub.customer.com,192.6.4.186,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Pub: CUCM-80-pub

Unresponsive::Component= cucm-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 06-28-2012 15:58:33;Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.186; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-29-2012 12:09:39; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCxn-80-sub

Unresponsive::Component= cucxn-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-28-2012 15:58:33; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.190;

IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:45; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCxn-80-sub

PerformancePollingStopped::Component= cucxn-80-sub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- sub.customer.com,192.6.4.190,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Pub: CUCxn-80-pub

PerformancePollingStopped::Component= cucxn-80-pub.customer.com; Error Message String= 29-Jun-2012 14:15:49 EDT,cucxn-80- pub.customer.com,192.6.4.189,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Pub: CUCxn-80-pub

Unresponsive::Component= cucxn-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 06-28-2012 15:58:37; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.189; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 06-28-2012 18:06:38; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> VM Availability

The virtual machine CUCxn-80-sub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301156 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-sub1 running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.34:ESX ON 2301145 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-sub2 running on host 10.13.3.33 is Disconnected. Message KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.33:ESX ON 2301136 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUP-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3-vcent-10.13.3.31:ESXON 2301155 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301158 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-pub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.31:ESX ON 2301158 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCxn-80-pub running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM[(Event_Type=N"VmDisconnectedEvent" ON VM:tb3- vcent-10.13.3.32:ESX ON 2301185 (Event_Type=VmDisconnectedEvent)]

...-> VoiceService

Meta event for Voice Service - C080


Next Steps


Step 1 Cross-launch to the domain manager UCSM to confirm that the chassis is powered off. Power on the chassis to clear the events.


UC21 - Insufficient Virtual Memory

This use case describes the events that Prime Central for HCS receives if a CUCM server runs out of virtual memory. This type of incident generates Service Impact (SI) events.

Observed RC-EL Events

None.

Observed SI-EL Events

CUCM voice service impacted voice mail and presence. The following table presents the list of events observed during internal testing.

Table A-66 Observed Service Events for UC21

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.

Table A-67 Observed Service Tree Events for UC21 

Location
Summary

...-> Voice Service

Meta event for Voice Service - C071

...-> Application Resources

LowAvailableVirtualMemory::Component= VMEM-cucm-71-pub.customer.com/ Memory; VmPercentageUsed= 84; LowAvailableVirtualMemoryThreshold= 25; Default Event Name=

LowAvailableVirtualMemory; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=LowAvailableVirtualMemory >;

...-> VM_Resources

The virtual machine guest memory usage is high on CUCM-71-pub. Message: KVM_VM_Guest_Memory_Util_High[(Guest_Util>40 ON VM:cisco-10.11.3.152:ESX ON CUCM-71-pub (Guest_Util=66)]


Next Steps


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details and select Next Steps to display the following recommendation:

Check CUCM Windows Task Manager or the RTMT tool to verify insufficient memory. This event 
may be caused by a memory leak. It is important to identify which process is using 
excessive memory. After the process is identified, if you suspect a memory leak (for 
example, if memory use for a process increases continually, or a process uses more memory 
than it should), you may want to contact your support team. 

UC22 - CPU Utilization Problems

This use case describes what events the Prime Central for HCS will receive if a CUCM server has a heavy load on its CPU. Service Impact (SI) Events will be generated due to this type of incident.

Observed RC-EL Events

None.

Observed SI-EL Events

CUCM voice service impacts voice mail and presence.

Table A-68 Observed Service Events for UC22

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.

Table A-69 Observed Service Tree Events for UC20

Location
Summary

...-> Application Resources

CPUPegging::Component= PROCcucm- 71-pub.customer.com/

_Total; PercentageCPU= 99; TopProcessesDetails= tomcat(5%);RisDC(1%);cmoninit(1%); CallProcessingNodeCpuPeggingThreshold= 90; Default Event Name= CPUPegging; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=CPUPegging >;

...-> Voice Service

Meta event for Voice Service - C071

...-> VM_Resources

CPU use high on CUCM-71-pub. Message: KVM_VM_CPU_Util_High[(Utilization>90) ON VM:cisco-10.11.3.152:ESX ON CUCM-71-pub (Utilization=93)\]


Next Steps


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details > Next Steps to display the following recommendation:

Check the Communications Manager Windows Task Manager or Real Time Monitoring Tool (RTMT) 
to verify CPU high utilization. The most common cause is one or more processes that use 
excessive CPU resources. The event has information on which process is using the most CPU. 
After the process is identified, you may want to take action, which could include: 
* Restarting the process
* Checking the trace setting for that process; using detailed trace level can take up 
excessive CPUresources
* Checking for events, such as Code Yellow, and launching Operations Manager synthetic 
tests, such as Dial Tone Test to see if there is any impact on call processing.
* You may want to take more drastic measures, such as stopping nonessential services.

For more information, see

http://www.cisco.com/en/US/products/sw/voicesw/ps556/ products_tech_note09186a00808ef0f4.shtml

http://www.cisco.com/en/US/products/sw/voicesw/ ps556/products_tech_note09186a00807f32e9.shtml.

For a video tutorial on troubleshooting the CPUPegging event, click the E-Learning button in online help.

UC23 - Call Throttling Failures (Code Red)

This use case describes the events that Prime Central for HCS receives if a CUCM server has Call Throttling failures in the Code Red range. This type of incident generates Service Impact (SI) events.

Observed RC-EL Events

None.

Observed SI-EL Events

CUCM voice service impacted voice mail and presence.

The following table presents the list of events observed during internal testing.

Table A-70 Observed Service Events for UC23

Location
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.

Table A-71 Observed Service Tree Events for UC23 

Location
Summary

...-> Voice Service

Meta event for Voice Service - C071

...-> Application Resources

Code Red::Component= 192.6.4.123- System; Code Yellow Duration= 300 NumberOfCallsRejectedDueToCallThrottling=0 TotalCodeYellowEntry=2

HighPriorityQueueDepth=0 NormalPriorityQueueDepth=0 LowPriorityQueueDepth=0 AppID=Cisco CallManager ClusterID=CUCM-CL-C071-1 NodeID=CUCM-71-pub : Unified CM has entered Code Red condition and will restart; Default Event Name= Code Red; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=CodeRed >;


Next Steps

Generally, repeated call throttling events require assistance. CUCM SDI and SDL trace files record call-throttling events and can provide useful information. Your support team may request these trace files for closer examination.


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details and select Next Steps to display the following recommendation:

When CUCM enters a Code Red state, the CUCM service restarts and produces a memory dump 
that may be helpful for analyzing the failure.

Note Events are cleared after 24 hours automatically or manually clear the event on Unified Operations Manager once you rectify the fault.



UC24 - Call Throttling Failures (Code Yellow)

This use case describes the events that Prime Central for HCS receives if a CUCM server has Call Throttling Failures in the Code Yellow Range. This type of incident generates Service Impact (SI) events.

Observed RC-EL Events

None.

Observed SI-EL Events

CUCM voice service impacts voice mail and presence.

The following table presents the list of events observed during internal testing.

Table A-72 Observed Service Events for UC24

Location
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.

Table A-73 Observed Service Tree Events for UC24

Location
Summary

...-> Voice Service

Meta event for Voice Service - C071

...-> Application Resources

Code Yellow::Component= 192.6.4.123-System; Exit Latency= 8; Expected Average Delay= 0; Total Code Yellow Entry= 4; Entry Latency= 20; Sample Size= 10; Default Event Name= Code Yellow; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=CodeYellow >;


Next Steps


Step 1 Right-click the Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click the Raw event > Event Details and select Next Steps to display the following recommendation:

While this event generates, check process CPU usage and memory usage. Check for call 
bursts and an increased number of registered devices (phones, gateways, and so on) 
generated.
Continuously monitor whether CUCM is out of the Code Yellow state. You can launch 
synthetic tests, such as the Dial Tone Test, to check for any impact on call processing.
To try to circumvent the possibility of a Code Yellow event, consider the possible causes 
of system overload, such as heavy call activity, low CPU availability for CUCM, routing 
loops, disk I/O limitations, disk fragmentation, and so on, and investigate those 
possibilities.

For more information, see Call Throttling and the Code Yellow State.

UC25 - Route List Exhausted

This use case describes the events that the Prime Central for HCS dashboard displays if calls fail on Route List fail because no channels are available for call routing. Prime Central for HCS performs SIA for this use case. This event alerts a Network Operator that calls to a particular destination are failing and demand immediate attention to stop further failures. This can happen for several reasons, for example, a remote IP address is not reachable on a SIP/H323 trunk; a gateway is not reachable; the call failed at next call processing node across an IP trunk or TDM trunk, or not a TDM trunk lacked sufficient channels for the call.

Observed RC-EL Events

None.

Observed SI-EL Events

A Route List Exhausted failure may not impact voice mail and presence services, but by default Prime Central for HCS indicates impact on voicemail and presence services if Voice service is impaired.

The following table presents the list of events observed during internal testing.

Table A-74 Observed Service Events for UC25

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.

Table A-75 Observed Service Tree Events for UC25

Severity
Summary

...-> Voice Service

Meta event for Voice Service - C070

...-> Call Conrol -->Resources

CUST_C070_CLS_CUCM_CUCMCL- C070-1 RouteGroups(RG-AGGR); Default Event Name= Route List Exhausted; DescriptionURL= <


Next Steps


Step 1 Right-click Raw event > Event Details > Next Steps to display the following recommendation:

Check the RTMT Syslog Viewer for verification and further details. Assess whether 
additional resources should be added in the indicated route.

UC26 - Media List Exhausted

This use case describes the events that Prime Central for HCS receives if calls fail because of unavailable media resources. Prime Central for HCS performs SIA for this use case. This event alert network operators that calls requiring media resources such as Announciator, Transcoder, Conference Bridge, and Music On Hold are failing.

Observed RC-EL Events

None.

Observed SI-EL Events

Media List Exhausted failures may not impact voice mail and presence services, but by default Prime Central for HCS indicates impact on voice mail and presence services if voice service is impaired. Table A-76 shows the Service Events observed during testing.

Table A-76 Observed Service Events for UC26

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view. The following table presents the list of events observed during internal testing.

Table A-77 Observed Service Tree Events for UC26

Severity
Summary

...-> Voice Service

Meta event for Voice Service - C070

...-> Call Conrol -->Resources

Media List Exhausted::Component= VE-CUCM-CL-C070-1- cucm-70-pub.customer.com-- NULL_LIST; Media Resource Type= Annunciator; Media Resource List Name= NULL_LIST; Default Event Name= Media List Exhausted; DescriptionURL= < http://150.0.0.52:1741/

CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=MediaListExhausted >;


Next Steps


Step 1 Right-click Route List Exhausted > Event Details > Next Steps to display the following recommendation:

Install additional resources to the indicated media resource list. This event indicates a 
network failure or device failure.

UC27 - High Resource Utilization by all Customer Sites

This use case describes the event that Prime Central for HCS receives if the High Utilization of Resources by all Customer sites event is encountered. Prime Central for HCS performs SIA for this use case.

Observed RC-EL Events

None.

Observed SI-EL Events

Media List Exhausted failures may not impact voice mail and presence services, but by default Prime Central for HCS indicates impacts to voice mail and presence services if voice service is impaired.

The following table presents the list of events observed during internal testing.

Table A-78 Observed Service Events for UC27 

Severity
Summary

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the Service Tree in the Service Availability view.

Table A-79 Observed Service Events for UC27

Location
Summary

...-> Voice Service

Meta event for Voice Service - C070.

...-> Call Conrol -->Resources

HighResourceUtilization::Component= Transcoder-cucm-70- pub.customer.com; Threshold Value(%)= 10; Violation Value(%)= 20; Port or Resource Type= Transcoder; Default Event Name= HighResourceUtilization; DescriptionURL= < http://150.0.0.52:1741/

CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=HighResourceUtilization >;


Next Steps


Step 1 Right-click the Media List Exhausted > Event Details > Next Steps to display the the following recommendation:

Assess whether you should install additional resources. While this event is generated, 
click the event ID to view event details and identify which resource exceeded the 
threshold.
Use the performance graph or RTMT (for CUCM) to monitor resource utilization in real time 
and over the past 72 hours to verify high utilization and determine whether you need to 
install additional resources.

UC28 - Memory, CPU, Disk Threshold Exceeded - CUCxn

This use case describes the events that Prime Central for HCS receives if there is a memory, CPU, or disk threshold exceeded issue on Unity Connection (CUCxn). Prime Central for HCS displays Service Impact (SI) events only for such incidents.

Observed RC-EL Events

None.

Observed SI-EL Events

These CUCxn issues affect Voice mail. The following table presents the list of events observed during internal testing.

Table A-80 Observed Service Events for UC27

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-81 Observed Service Tree Events for UC28

Location
Summary

...-> ApplicationResources

InsufficientFreeMemory::Component= RAM-cucxn-71-pub.customer.com/1; RAMTotalSize= 3920 MB; FreePhysicalMemoryThreshold= 15; UsedRAM= 3488 MB; FreePhysicalMemoryInPercentage= 11 %; Default ;

...-> ApplicationResources

HighUtilization::Component= PSRcucxn- 71-pub.customer.com/0; ProcessorUtilizationThreshold= 90; CpuUtilFiveMin= 99 %; Default Event Name= HighUtilization; DescriptionURL= < ;

...-> ApplicationResources

InsufficientFreeHardDisk::Component= DISK-cucxn-71-pub.customer.com/9; HardDiskTotalSize= 99404 MB; FreeHardDiskThreshold= 15; FreeHardDiskInPercentage= 11 %; HardDiskUsed= 87824 MB; Default ;


Next Steps


Step 1 Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click Raw event > Event Details > Next Steps to display the following recommendation:

*Insufficient free memory
On Cisco IOS devices, run show memory to check memory utilization. Sometimes high memory 
utilization indicates a memory leak. Identify which process is using excessive memory and 
take action (including restarting the process). On other devices, close any unnecessary 
applications and stop the services that are not being used or are not required. 
 
   
*High utilization
Identify the processes using excessive CPU space. You may want to take action, which can 
include restarting the identified process or processes.
 
   
*Insufficient free disk space
Uninstall unnecessary applications, delete temporary files to free disk space, and clean 
up unnecessary files.

UC29 - Low Number Of Available Licenses - CUCxn

This use case describes the events that Prime Central for HCS display if there are few available licenses in CUCxn. Prime Central for HCS displays Service Impact (SI) events for such incidents.

Observed RC-EL Events

None.

Observed SI-EL Events

Unavailable CUCxn licenses impacts Voicemail service.

Table A-82 Observed Service Events for UC29

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C072-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-83 Observed Service Tree Events for UC29

Location
Summary

...-> VoicemailResources

Subscriber License Violated::Component= 192.6.4.133- System; Detail= CUCxn-72-sub]: An insufficient license violation has occurred. For details,open the Licensing screens on Cisco Unity Connection Administration web pages. Tag LicSubscribersMax licenses 10 subscribers, but 502 are being used. Please reduce usage to match the licensed limits or purchase additional licensed functionality.


Next Steps


Step 1 Right-click Synthetic RC event > Show Contained Events to display the corresponding raw events.

Step 2 Right-click Raw event > Event Details > Next Steps to display the recommendation.


UC30 - VM Resources - Memory

This use case describes the event that receives if a VM exceed its memory threshold. Prime Central for HCS performs SIA for this use case; no RCA is performed.

Observed RC-EL Events

None.

Observed SI-EL Events

CUCM voice service impacts voice mail and presence. In this example, a CUCM VM is used and all voice, VM, and presence services are affected because of the dependency. If CUCxN or CUP VM memory is used, you should see only one service impacted (CUCxn or CUP).

Table A-84 Observed Service Events for UC30

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

Table A-85 Observed Service Tree Events for UC30

Location
Summary

...-> VM_Resources

The virtual machine guest memory usage is high on CUCM-70-sub. Message: KVM_VM_Guest_Memory_Util_High[(Guest_Util>ON VM:cisco-10.11.3.145:ESX ON CUCM-70-sub (Guest_Util=76)]

...-> Voice Service

Meta event for Voice Service - C070


Next Steps


Step 1 Check vCenter to confirm the alarm.

Step 2 Add additional memory resources, if required to rectify the alarm.


UC31 - VM Resources - CPU

This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the CPU. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.

Root Cause Events Observed

None.

Service Events Observed

CUCM voice service impacted voice mail and presence. In this example, CUCM VM is used and all Voice/ VM/Presence services are affected due to dependency. If CUCxN or CUP VM CPU threshold is violated, it should only see one service affected: CUCxn or CUP service.

Table A-86 SI events Observed for UC31 - VM Resources - CPU

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C071-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C071-1 is Marginal.


Other Events Observed

None.

Table A-87 SI events Observed for UC31 - VM Resources - CPU

Location
Summary

...-> VM_Resources

CPU use high on CUCM-71-pub. Message: KVM_VM_CPU_Util_High[(Utilization>5) ON VM:cisco-10.11.3.152:ESX ON CUCM-71-pub (Utilization=10)]

...-> Voice Service

Meta event for Voice Service - C071


Next Steps

Check vCenter to confirm the alarm. Add additional CPU resources to VM as needed to rectify the alarm.

UC32 - VM Resources - Disk usage

This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the disk usage. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.

Root Cause Events Observed

None.

Service Events Observed

CUCM voice service impacted voice mail and presence. In this example, CUCM VM is used and all Voice/ VM/Presence services are affected due to dependency. If CUCxN or CUP VM disk is used, it should only see one service affected: CUCxn or CUP service.

Table A-88 SI events Observed for UC32 - VM Resources - Disk usage

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.


Other Events Observed

None.

Service tree event overlay location

Table A-89 SI events Observed for UC32 - VM Resources - Disk usage

Location
Summary

...-> VM_Resources

The virtual machine disk partition free space is low on CUCM-70-sub.Message: KVM_VM_Disk_Free_Low \[(Percent_Free>=0 AND Percent_Free<10) ON VM:cisco-10.11.3.145:ESX ON CUCM-70-sub (Percent_Free=0)\]

...-> Voice Service

Meta event for Voice Service - C070


Next Steps

Check vCenter to confirm the alarm and remove files in VM to free up disk space.

UC33 - VM Resources - CPU ready time

This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the CPU Ready Time. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.

Root Cause Events Observed

None.

Service Events Observed

CUCM voice service impacted voice mail and presence. In this example, event is triggered on CUCM VM and all Voice/ VM/Presence services are affected due to dependency. If event is triggered on CUCxN or CUP VM, it should only see one service affected: CUCxn or CUP service.

Table A-90 Service Events Observed for UC33 - VM Resources - CPU ready time

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.


Other Events Observed

None.

Table A-91 Service Tree events Observed for UC33 - VM Resources - CPU ready time

Location
Summary

...-> VM_Resources

The CPU percent ready is high on CUCM-70-sub. Message: KVM_VM_CPU_Ready_High[(Percent_Rdy>5) ON VM:cisco-10.11.3.148:ESX ON CUCM-70-sub (Percent_Rdy=7)]

...-> VM_Resources

The CPU percent ready is high on CUCM-70-pub. Message: KVM_VM_CPU_Ready_High[(Percent_Rdy>5) ON VM:cisco-10.11.3.148:ESX ON CUCM-70-pub (Percent_Rdy=8)]

...-> Voice Service

Meta event for Voice Service - C070


Next Steps


Step 1 Check vCenter DM to confirm the alarm and make sure host is not overstressed by hosting VMs.

Step 2 Manually move some VMs to other hosts if needed.

Step 3 Configure DRS if possible to minimize the stress on any particular hosts among the cluster.


UC34 - VM Resources - Disk latency

This use case describes what event will Prime Central for HCS receive if VM exceed the threshold for the Disk Latency. Prime Central for HCS will only perform SIA for this use case and no RCA will be performed.

Root Cause Events Observed

None.

Service Events Observed

CUCM voice service impacted voice mail and presence. In this example, event is triggered on CUCM VM and all Voice/ VM/Presence services are affected due to dependency. If event is triggered on CUCxN or CUP VM, it should only see one service affected: CUCxn or CUP service.

Table A-92 Root Cause Events Observed for UC34 - VM Resources - Disk latency

Severity
Summary

Minor

Overall Attribute of the Customer_Voicemail_Service_Template tag of CUCxn-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-CL-C070-1 is Marginal.

Minor

Overall Attribute of the Customer_Voice_Service_Template tag of CUCM-CL-C070-1 is Marginal.


Other Events Observed

None.

Table A-93 Service tree events Observed for UC34 - VM Resources - Disk latency

Location
Summary

...-> VM_Resources

Alarm ''Virtual Machine Disk Latency High'' on CUCM-70- sub changed from Red to Yellow. Message: KVM_VM_Disk_Latency \[(Event_Type=N"AlarmStatusChangedEvent" AND Event_TextLIKEN"*Virtual*Machine*Disk*Latency*" ON VM:cisco-10.11.3.148:ESX ON 31448316 (Event_Type=AlarmStatusChangedEventEvent_Text=Alarm ''Virtual Machine Disk Latency High'' on CUCM-70-sub changed from Red to Yellow)\]

...-> VM_Resources

Alarm ''Virtual Machine Disk Latency High'' on CUCM-70- pub changed from Red to Yellow. Message: KVM_VM_Disk_Latency \[(Event_Type=N"AlarmStatusChangedEvent" AND Event_TextLIKEN"*Virtual*Machine*Disk*Latency*" ON VM:cisco-10.11.3.148:ESX ON 31448313 (Event_Type=AlarmStatusChangedEvent Event_Text=Alarm ''Virtual Machine Disk Latency High'' on CUCM-70-pub changed from Red to Yellow)]

...-> Voice Service

Meta event for Voice Service - C070.


Next Steps


Step 1 Check vCenter DM to confirm the alarm and make sure LUN is not overstressed by hosting VMs.

Step 2 Manually move some VMs to other LUN if needed.

Step 3 Configure Storage DRS if possible to minimize the stress on any particular LUN.


UC35 - ASR1K - Chassis Failure

This use case describes the impact on offnet service in the event of a chassis failure in the physical ASR1K router.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-94 Service Events Observed for UC35 - ASR1K - Chassis Failure

Severity
Summary

Critical

<Customer-Name> offnet voice service is bad. When all ASR1Ks under CUBE-SP service are down.

Minor

<Customer-Name> offnet voice service is marginal. When one or more (but not all) ASR1Ks under CUBE-SP service are down.


Other Events Observed

None

Table A-95 Service Tree events Observed for UC35 - ASR1K - Chassis Failure

Location
Summary

...->Router Availability

Unresponsive::Component= 172.20.127.100; SystemObjectID= .1.3.6.1.4.1.9.1.923; Description= Cisco IOS Software, IOS-XE Software (PPC_LINUX_IOSD-ADVENTERPRISEK9-M), Version 15.3(1)S, RELEASE SOFTWARE (fc4)\X0D\X0ATechnical Support: http://www.cisco.com/techsupport\X0D\X0ACopyright (c) 1986-2012 by Cisco Systems, Inc.\X0D\X0ACompiled Tue 27-Nov-12 11:05 by mcpre; DiscoveredFirstAt= 01-28-2013 10:20:18; Type= ROUTER; DisplayClassName= Router; SNMPAddress= 172.20.127.100; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 01-28-2013 10:20:18; Default Event Name= Unresponsive; Description URL= < http://172.23.2.235:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=Unresponsive >;


Next Steps


Step 1 Check vCenter DM to confirm the alarm and make sure the host is not overstressed by hosting VMs.

Step 2 Manually move some VMs to other hosts, if needed.

Step 3 Configure DRS, if possible, to minimize the stress on any particular hosts among the cluster.


UC36 - ASR1K - Power Supply/Fan Failure

This use case describes the impact on offnet voice service in the event of a power supply or fan failure in the physical ASR1K router.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-96 Service Events Observed for UC35- ASR1K - Power Supply/Fan Failure

Severity
Summary

Minor

<Customer-Name> offnet voice service is marginal


Other Events Observed

None

Table A-97 Service Tree events Observed for UC35 - ASR1K - Power Supply/Fan Failure

Location
Summary

...->Router Environmental

TemperatureHigh::Component= 172.20.127.100; SystemObjectID= .1.3.6.1.4.1.9.1.923; Description= Cisco IOS Software, IOS-XE Software (PPC_LINUX_IOSD-ADVENTERPRISEK9-M), Version 15.3(1)S, RELEASE SOFTWARE (fc4)\X0D\X0ATechnical Support: http://www.cisco.com/techsupport\X0D\X0ACopyright (c) 1986-2012 by Cisco Systems, Inc.\X0D\X0ACompiled Tue 27-Nov-12 11:05 by mcpre; DiscoveredFirstAt= 01-28-2013 10:20:18; Type= ROUTER; DisplayClassName= Router; SNMPAddress= 172.20.127.100; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 01-28-2013 10:20:18; Default Event Name= TemperatureHigh; DescriptionURL= < http://172.23.2.235:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=TemperatureHigh >;


UC37 - ASR1K - RP/ES/SPA Failure

This use case describes the impact on offnet voice service in the event of a router card failure that results in interface down events.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-98 Service Events Observed for UC35- ASR1K - Chassis Failure

Severity
Summary

Minor

<Customer-Name> offnet voice service is marginal.


Other Events Observed

None

Table A-99 Service Free Events Observed for UC37 - ASR1K - RP/ES/SPA Failure

Location
Summary

...->Router Interface Availability

OperationallyDown::Component= IF-10.13.1.72/33 [SB86]; AdminStatus= UP; DuplexMode= FULLDUPLEX; OperStatus= UP; MaxSpeed= 56000; Type= GENERIC; Mode= NORMAL; IsFlapping= false; InterfaceCode= CODEUNKNOWN; LastChangedAt= 01-08-2013 22:11:02; Default Event Name= OperationallyDown; DescriptionURL= < http://150.0.0.44:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=OperationallyDown >;


UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk

This use case describes the impact on offnet voice service when there is a loss of SIP trunk between CUBE-SP and leaf cluster without losing IP connectivity.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-100 Service Events Observed for UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk

Severity
Summary

Critical

<Customer-Name> offnet voice service is bad. When all SIP Trunks are out of service.

Minor

<Customer-Name> offnet voice service is marginal. When one or more SIP trunks are out of service.


Other Events Observed

None

Table A-101 Service Tree events Observed for UC38 - SIP Trunk from Leaf to CUBE-SP - Loss of SIP Trunk

Location
Summary

...->SIP Trunk under CUCM Offnet Service

SIP Trunk Out Of Service::Component= 172.23.2.200-hcs-asr-1; SIP Trunk Name= hcs-asr-1; Unavailable remote peers with Reason Code= [ local=2, 172.23.218.2]; Default Event Name= SIP Trunk Out Of Service; DescriptionURL= < http://172.23.2.235:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=SIPTrunkOutOfService >;

...->SIP Trunk under CUCM Offnet Service

SIP Trunk Partially In Service::Component= 172.23.2.200-hcs-asr-1; SIP Trunk Name= hcs-asr-1; Available remote peers for this SIP trunk= [195.5.171.1]; Unavailable remote peers with Reason Code= [ local=2, 1.1.1.1]; Default Event Name= SIP Trunk Partially In Service; DescriptionURL= < http://172.23.2.235:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=SIPTrunkPartiallyInService >;


UC39 - CUBE-SP Adjacency Status

This use case describes the impact on offnet voice service when there is a loss of adjacency when CUCM application is down.

Root Cause Events Observed

Root cause events are generated when CUCM is shut down. The following events are displayed as synthetic root cause events:

OM_CUCM_OM_Connectivity—Indicates that Cisco Unified Communications Manager is down.

OM_CUCM_NodeRestart—Indicates that Cisco Unified Communications Manager node has been restarted.

OM_CUCM_Processes—Indicates that Cisco Unified Communications Manager services are down.

Service Events Observed

offnet voice service events are listed in the following table:

Table A-102 Service Events Observed for UC39 - CUBE-SP Adjacency Status

Severity
Summary

Critical

<Customer-Name> offnet voice service is bad. When all northbound or southbound adjacencies are down.

Minor

<Customer-Name> offnet voice service is marginal. When one or more northbound or southbound adjacencies are down.


Other Events Observed

None

Table A-103 Service Tree events Observed for UC39 - CUBE-SP Adjacency Status

Location
Summary

Southbound Adjacency Service -> Adjacency Name -> Adjacency Status

AdjacencyDetached::Component= hcs-sbc/Cust1/SIP; SBCServiceName= hcs-sbc; AdjacencyType= SIP; AdjacencyName= Cust1; AdjacencyAccountName= Aggregation; Default Event Name= AdjacencyDetached; DescriptionURL= < http://172.23.2.235:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=AdjacencyDetached >;

Northbound Adjacency Service -> Adjacency Name -> Adjacency Status

AdjacencyDetached::Component= hcs-sbc/NB-hcs-sbc-1/SIP; SBCServiceName= hcs-sbc; AdjacencyType= SIP; AdjacencyName= NB-hcs-sbc-1; AdjacencyAccountName= NB-hcs-sbc-1; Default Event Name= AdjacencyDetached; DescriptionURL= < http://172.23.2.235:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=AdjacencyDetached >;


UC40 - Voice Quality Degradation

This use case describes the impact on offnet service when MOS quality is critical or major.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-104 Service Events Observed for UC35 - Voice Quality Degradation

Severity
Summary

Minor

<Customer-Name> offnet voice service is marginal.


Other Events Observed

None

Table A-105 Service Tree events Observed for UC40 - Voice Quality Degradation

Location
Summary

Southbound Adjacency Service -> Adjacency Name -> Adjacency QoS

MOSCQEReachedMajorThreshold::Component= hcs-sbc/Cust1; SBCServiceName= hcs-sbc; AdjacencyName= Cust1; MOSCurrentValue= 41; AlertPreviousLevel= Normal; NormalAlertCount= 0; MinorAlertCount= 1; MajorAlertCount= 0; CriticalAlertCount= 0; AlertSummaryPeriod= 5 minutes; AlarmDescription= Approximated MOSCQE value crossed major level configured; Default Event Name= MOSCQEReachedMajorThreshold; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=MOSCQEReachedMajorThreshold >

Northbound Adjacency Service -> Adjacency Name -> Adjacency QoS

MOSCQEReachedCriticalThreshold::Component= hcs-sbc/NB-hcs-sbc-1; SBCServiceName= hcs-sbc; AdjacencyName= NB-hcs-sbc-1; MOSCurrentValue= 41; AlertPreviousLevel= Normal; NormalAlertCount= 0; MinorAlertCount= 1; MajorAlertCount= 0; CriticalAlertCount= 0; AlertSummaryPeriod= 5 minutes; AlarmDescription= Approximated MOSCQE value crossed critical level configured; Default Event Name= MOSCQEReachedCriticalThreshold; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=MOSCQEReachedCriticalThreshold >;


UC41 - CUBE - SP Security Violation

This use case describes the impact on offnet voice service when CUBE-SP security events take place.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-106 Service Events Observed for UC41 - CUBE - SP Security Violation

Severity
Summary

Minor

<Customer-Name> offnet voice service is marginal.


Other Events Observed

None

Table A-107 Service Tree events Observed for UC41 - CUBE-SP Security Violation

Location
Summary

...-> CUBE-SP Security

SourceAlert::Component= hcs-sbc/3/1/3/3.3.3.33/1/1.1.1.112/3/VPNID; SBCServiceName= hcs-sbc; VdbeId= 3; GateId= 1; FlowPairId= 3; LocalAddressType= dns; LocalAddress= 3.3.3.33; LocalPort= 1; RemoteAddressType= ipv4z; RemoteAddress= 1.1.1.112; RemotePort= 3; VpnId= VPNID; AlarmDescription= This is to alert that some unwanted data packets are received by the system from an undesirable IP/port.; Default Event Name= SourceAlert; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=SourceAlert >;

..-> CUBE-SP Security

DynamicBlackList::Component= hcs-sbc/globalnew12/0.0.0.23/0; SBCServiceName= hcs-sbc; SubFamily= Blacklist VPN; VpnId= globalnew12; AddressType= ipv4; Address= 0.0.0.23; TransportType= UDP; PortNumber= 0; AlarmDescription= source is added to or removed from the blacklist table; Default Event Name= DynamicBlackList; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=DynamicBlackList >;


UC42 - CUBE-SP Resource Performance Degradation

This use case describes the impact on offnet voice service when CUBE-SP resource performance degrades.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-108 Service Events Observed for UC42 - CUBE-SP Resource Performance Degradation

Severity
Summary

Minor

<Customer-Name> offnet voice service is marginal.


Other Events Observed

None

Table A-109 Service Tree events Observed for UC42 - CUBE-SP Resource Performance Degradation

Location
Summary

...->CUBE-SP Performance

MemoryCongestion::Component= hcs-sbc/3; SBCServiceName= hcs-sbc; AlarmDescription= CPU/Memory congestion in SBC is raised; Default Event Name= MemoryCongestion; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=MemoryCongestion >;

...->CUBE-SP Performance

CPUCongestion::Component= hcs-sbc/2; SBCServiceName= hcs-sbc; AlarmDescription= CPU/Memory congestion in SBC is raised; Default Event Name= CPUCongestion; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=CPUCongestion >;


UC43 - CUBE-SP SLA Violation

This use case describes the impact on offnet voice service when CUBE-SP SLA violation takes place.

Root Cause Events Observed

None

Service Events Observed

offnet voice service events are listed in the following table:

Table A-110 Service Events Observed for UC43 - CUBE-SP SLA Violation

Severity
Summary

Minor

<Customer-Name> offnet voice service is marginal.


Other Events Observed

None

Table A-111 Service Tree events Observed for UC43 - CUBE-SP SLA Violation

Location
Summary

...->CUBE-SP Performance

SLAViolation::Component= hcs-sbc/unknown/global/call setup; SBCServiceName= hcs-sbc; SLAPolicyAccountName= unknown; SLAPolicyScope= global; SLAPolicyLimit= 700; SLACurrentUsage= 700; SLAViolationEvent= call setup; SLAPolicyRestriction= allowable number of concurrent calls; AlarmDescription= Violation of Service Level Agreement as described in the policy tables; Default Event Name= SLAViolation; DescriptionURL= < http://172.23.85.117:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine?tag=SLAViolation >;


CP1 - CUCMIP Critical Processes Failure

This use case describes the events that the Prime Central for HCS dashboard displays if a critical process fails in CUCMIP. Prime Central for HCS generates RC and SI events for such incidents.

Observed RC-EL Events

When the critical process such as sipd is down, the CUMIP server generate Service Down event and CUOM process it and transmit it to Prime Central for HCS system.

Table A-112 Observed RC-EL Events for CP1)

Severity
EventTypeId
Summary

Critical

OM_CUP_OM_ Processes

Synthetic Event for OM_CUP_Processes group events from cup-82- pub.customer.com


Observed SI-EL Events

If critical process such as sipd is down, it will affect presence and IM feature of soft clients like Cisco Jabber.

Table A-113 Observed SI-EL Events for CP1

Severity
Summary

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-87-pub is Marginal.


Observed Other-EL Events

None

Correlated Events in Service Tree

The following table shows the events that are overlaid in Service Tree.

Table A-114 Observed SI-EL Events for CP1

Node
Summary

87-pub.customer.com

ServiceDown::Component= VS-cup-87-pub.customer.com/ Cisco SIP Proxy; ProductName= Cisco SIP Proxy; CurrentState= Stopped; Default Event Name= ServiceDown; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=ServiceDown >;


Next Steps


Step 1 Abort the script that will kill the sipd process.

Step 2 Go to the UC Serviceability page and Start the Cisco SIP Proxy service.


CP2 - Application Cold Failure - CUCMIP

This use case describes the events that Prime Central for HCS receives if a CUCMIP server restarts. This type of incident generates RC and SI events.

Observed RC-EL Events

When the CUCMIP server restarts, synthetic RCA events of type OM_CUP_OM_Connectivity is Observed.

Table A-115 Observed RC-EL Events for CP2

Severity
EventTypeId
Summary

Critical

OM_CUP_OM_Connectivity

Synthetic Event for OM_CUP_OM_Connectivity group events from cup-82-pub.customer.com


Observed SI-EL Events

CUP voice service impacts voice mail and presence. The following table shows SI-EL events observed during testing.

Table A-116 Observed SI-EL Events for CP2

Severity
Summary

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.


Observed Other-EL Events

None.

Correlated Events in Service Tree

The following table shows events which are correlated in Service Tree.

Table A-117 Observed Service Tree Events for CP2

Severity
Summary

...-> Cluster_Availability--> Pub: CUP-82-pub

RTMTDataMissing::Component= cup-82-pub.customer.com; Name= cup-82-pub.customer.com; HostDescription= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;

...-> Cluster_Availability--> Pub: CUP-82-pub

PerformancePollingStopped::Component= cup-82-pub.customer.com; Error Message String= 16- Oct-2012 15:07:59 EDT,cup-82- pub.customer.com,192.6.4.210,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;


Next Steps

The system restarts automatically and the events are also cleared in 60 minutes.

CP3 - VMware VM Failure - CUCMIP

This use case describes the events that the Prime Central for HCS dashboard displays if a VM running CUCMIP fails abruptly. Prime Central for HCS generates RC and SI events for such incidents.

Observed RC-EL Events

When the VM shuts down, numerous synthetic RCA events are observed, including VC_VM_Avlblty, and OM_CUP_OM_Connectivity. Eventually, Prime Central for HCS stabilizes to one root cause, VC_VM_Avlblty.

Table A-118 Observed RC-EL Events for CP3

Severity
EventTypeId
Summary

Critical

VC_VM_Avlblty

Synthetic Event for VC_VM_Avlblty group events from CUP-82-pub


Observed SI-EL Events

VM failure impacts presence service. Table 20-99 shows SI-EL events observed during testing.

Table A-119 Observed SI-EL Events for CP3

Severity
Summary

Minor

Overall attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid in the service tree view portlet. The following table shows service tree events observed during testing:

Table A-120 Observed Service Tree Events for CP3

Location
Summary

...-> Cluster_Availability--> Pub: CUP-82-pub

RTMTDataMissing::Component= cup-82-pub.customer.com; Name= cup-82-pub.customer.com; HostDescription= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;

...-> Cluster_Availability--> Pub: CUP-82-pub

PerformancePollingStopped::Component= cup-82-pub.customer.com; Error Message String= 16- Oct-2012 13:28:00 EDT,cup-82- pub.customer.com,192.6.4.210,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= <http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Pub: CUP-82-pub

Unresponsive::Component= cup-82-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-11-2012 13:53:45; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.210; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-16-2012 06:03:55; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Pub: CUP-82-pub

The virtual machine CUP-82- pub running on 10.11.3.158 is offline. Message: KVM_VM_Powered_Off_Cisco_HCM [(Event_Type=N"VmPoweredOffEvent") ON VM:tb1-vcent-10.11.3.158:ESX ON 39450127 (Event_Type=VmPoweredOffEvent)]


Next Steps

Power on the affected VM.

CP4 - CUCMIP VMware ESXi Host Failure

This use case describes the events that Prime Central for HCS receives if the VMware ESXi host fails. This type of incident generates RC and SI events. The CUCMIP VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCMIP nodes stay down until the ESXi Host is recovered.

Observed RC-EL Events

When the ESXi host shuts down, numerous synthetic RCA events are observed, including VC_Host_Avlblty, VC_VM_Avlblty, UCS_BladeLinks, and OM_CUP_OM_Connectivity. Eventually, there are two synthetic RCA events outstanding: VC_Host_Avlblty and UCS_Bladelinks. These two are sibling events in the correlation tree. The following table shows RC-EL events observed during testing.

Table A-121 Observed RC-EL Events for CP4

Severity
EventTypeId
Summary

Major

UCS_BladeLinks

Synthetic Event for UCS_BladeLinks group events from 10.13.2.8

Critical

VC_Host_Avlblty

Synthetic Event for VC_Host_Avlblty group events from 10.13.3.34


Observed SI-EL Events

CUCMIP service is impacted.

Table A-122 Observed SI-EL Events for CP4

Severity
Summary

Critical

Overall attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for impacted services.


Table A-123 Observed Other-EL Events for CP4

Severity
Summary

S = Major

N = 10.13.2.8

Network Interface (ifIndex = 469775808) Down, should be Up (ifEntry.469775808)

S = Major

N = 10.13.2.9

Network Interface (ifIndex = 469775824) Down, should be Up (ifEntry.469775824)

S = Major

N = 10.13.2.8

Link Down (server 3/4, VNIC eth0)

S = Major

N = 10.13.2.9

Link Down (server 3/4, VNIC eth1)

S = Major

N = 10.13.2.8

Link Down (server 3/4, VHBA fc0)

S = Major

N = 10.13.2.9

Link Down (server 3/4, VHBA fc1)

S = Minor

N = 10.13.2.8

Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317342.301)

S = Minor

N = 10.13.2.9

Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317343.302)


Service Tree Event Overlay Location and Content

SIA events are overlaid on the service tree view portlet.

Table A-124 Observed Service Tree Events for CP4

Severity
Summary

...-> Cluster_Availability--> Node:CUP-80-pub

PerformancePollingStopped::Component= cup-80-pub.customer.com; Error Message String= 23- Oct-2012 15:15:41 EDT,cup-80- pub.customer.com,192.6.6.190,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Node:CUP-80-sub

PerformancePollingStopped::Component= cup-80-sub.customer.com; Error Message String= 23- Oct-2012 15:15:41 EDT,cup-80- sub.customer.com,192.6.6.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Node:CUP-80-pub

Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-17-2012 18:48:37; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-23-2012 06:04:21; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Node:CUP-80-sub

Unresponsive::Component= cup-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-17-2012 19:37:09; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-23-2012 06:04:05; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> VM Availability

The virtual machine CUP-80- pub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530557 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUP-80- sub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530557 (Event_Type=VmDisconnectedEvent)]


Next Steps


Step 1 The VM on the host is automatically brought up in another host if HA is enabled.

Step 2 Follow the steps given below to bring back the original host:

a. Go to UCSM to bring the host back through boot server.

b. Manually power on CUCMIP VM if HA is not configured and VM is not configured to restart with host.

c. Drag and drop CUCMIP VM to the original host if HA is enabled.


CP5 - CUCMIP UCS Blade Failure

This use case describes the events that Prime Central for HCS receives if a UCS blade fails. This type of incident generates RC and SI events. The CUCMIP VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCMIP nodes stay down until the UCS blade is replaced.

Table A-125 Observed RC-EL Events for CP5

Severity
EventTypeId
Summary

Critical

UCS_Blade_Avlblty

Synthetic Event for UCS_Blade_Avlblty group events from 10.13.2.8


Observed SI-EL Events

CUCMIP service is impacted.

Table A-126 Observed SI-EL Events for CP5

Severity
EventTypeId
Summary

Critical

CUST_C080_CLS_CUP_CUP-80-pub

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.


Observed Other-EL Events

Prime Central for HCS does not analyze these events, but they could point to potential root causes for the impacted services.

Table A-127 Observed Other-EL Events for CP5 

Severity (s)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Indeterminate

N = 10.13.2.8

EN = fltAdaptorUnitAdaptorReachability

ET = default

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-80-pub is Bad.

S = Major

N = 10.13.2.8

EN = fltLsServerRemoved

ET = UCS_Blade_ServiceProfile

Service profile c3b4 underlying resource removed (FaultCode:fltLsServerRemoved, FaultIndex:3860463)

S = Major

N = 10.13.2.8

EN = fltAdaptorExtIfLinkDown

ET = UCS_Adapter

Adapter uplink interface 3/4/1/2 link state: unavailable (FaultCode:fltAdaptorExtIfLinkDown, FaultIndex:3860496)

S = Major

N = 10.13.2.8

 

Network Interface (ifIndex = 469775808) Down, should be Up (ifEntry.469775808)

S = Major

N = 10.13.2.9

 

Network Interface (ifIndex = 469775824) Down, should be Up (ifEntry.469775824)

S = Major

N = 10.13.2.8

 

Link Down (server 3/4, VNIC eth0)

S = Major

N = 10.13.2.9

 

Link Down (server 3/4, VNIC eth1)

S = Major

N = 10.13.2.8

 

Link Down (server 3/4, VHBA fc0)

S = Major

N = 10.13.2.9

 

Link Down (server 3/4, VHBA fc1)

S = Major

N = 10.13.2.8

 

Network Interface (ifIndex = 520224960) Down, should be Up (ifEntry.520224960)

S = Major

N = 10.13.2.9

 

Network Interface (ifIndex = 520224960) Down, should be Up (ifEntry.520224960)

S = Minor

N = 10.13.2.8

 

Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317342.301)

S = Minor

N = 10.13.2.9

 

Fibre Channel Trunk Interface Down, Port Gracefully Shutdown (fcTrunkIfEntry.503317343.302)


Service Tree Event Overlay Location and Content

SIA events are overlaid on the service tree view portlet.

Table A-128 Observed Other-EL Events for CP5 

Location
Summary

...-> Cluster_Availability --> Node:CUP-80-pub

PerformancePollingStopped::Component= cup-80-pub.customer.com; Error Message String= 26- Oct-2012 16:16:28 EDT,cup-80- pub.customer.com,192.6.6.190,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability --> Node:CUP-80-sub

PerformancePollingStopped::Component= cup-80-sub.customer.com; Error Message String= 26- Oct-2012 16:16:28 EDT,cup-80- sub.customer.com,192.6.6.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Node:CUP-80-pub

Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-24-2012 19:08:32; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-26-2012 06:05:53; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Node:CUP-80-sub

Unresponsive::Component= cup-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-24-2012 22:21:22; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.191; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-26-2012 06:05:47; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> VM Availability

The virtual machine CUP-80- pub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530557 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUP-80- sub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3- vcent-10.13.3.34:ESX ON 2530558 (Event_Type=VmDisconnectedEvent)]


Next Steps


Step 1 The VM on the host is automatically brought up in another host if HA is enabled.

Step 2 The original host is brought back via following steps:

a. Troubleshoot and resolve the blade issue.

a. Manually power on CUCMIP VM if HA is not enabled and VM is not configured to restart with host.

b. Drag and drop CUCMIP VM to the original host if HA is enabled.


CP6 - CUCMIP UCS Chassis Failure

This use case describes the events that Prime Central for HCS receives if the chassis hosting CUCM, CUCxn, and CUCMIP nodes loses power. This type of incident generates RC and SI events. The CUCMIP VM is automatically brought up in another host if HA is enabled on the cluster. If HA is not configured for the cluster, CUCMIP nodes stay down until the chassis is powered on.

In the following example, the same chassis hosts all UC VMs for customer 80.

Observed RC-EL Events

During the chassis powers off/on, numerous synthetic RCA events may be observed, including UCS_Chassis_Fault, UCS_Blade_Avlblty, VC_Host_Avlblty, and UCS_BladeLinks. The UCS_Chassis_Fault synthetic RCA event is the root cause. Additional root cause events were observed during our testing because of issues outlined in the note following the table.

Table A-129 Observed RC-EL Events for CP6

Severity
EventTypeId
Summary

Critical

UCS_Chassis_Fault

Synthetic Event for UCS_Chassis_Fault group events from 10.13.2.8

Critical

OM_CUCM_OM_Connectivity

Synthetic Event for OM_CUCM_OM_Connectivity group events from CUCM-CL-C080-1

Critical

VC_Host_Avlblty

Synthetic Event for VC_Host_Avlblty group events from 10.13.3.33



Note Event OM_CUCM_OM_Connectivity shows up as the root cause event because the cluster level event does not participate in the event correlation dependency tree in the current release. VC_Host_Avlblty shows up as the root cause event because of the DDTS CSCuc06575 - Some VC_Host_Avlblty events remained as root cause during chassis failure.


Observed SI-EL Events

CUCM voice service impacts voice mail and presence. In this example, CUCxn and CUCMIP VMs are hosted in the same chassis, and all voice, voice mail, and presence services are affected.

Table A-130 Observed SI-EL Events for CP6 

Severity (S)/Customer (C)/Node
(N)
EventName (EN)/EventTypeId (ET)
Summary

S = Major

N = 10.13.2.8

EN = fltAdaptorExtIfLinkDown

ET = UCS_Adapter

Adapter uplink interface 3/1/1/2 link state: unavailable (FaultCode:fltAdaptorExtIfLinkDown, FaultIndex:3711164)

S = Indeterminate

N = 10.13.2.8

EN = fltAdaptorUnitAdaptorReachability

ET = default

Adapter 3/1/1 is unreachable (FaultCode:fltAdaptorUnitAdaptorReachability, FaultIndex:3695955)

S = Major

N = 10.13.2.8

EN = fltEtherSwitchIntFIoSatelliteConnection Absent

ET = UCS_PortsLinks

No link between IOM port 3/1/1 and fabric interconnect A:1/9 (FaultCode:fltEtherSwitchIntFIoSatelliteConnection Absent, FaultIndex:3468974)

S = Major

N = 10.13.2.8

EN = fltDcxVcMgmtVifDown

ET = UCS_Mgmt_Link

IOM 3 / 1 (A) management VIF 3 down, reason None (FaultCode:fltDcxVcMgmtVifDown, FaultIndex:3468977)

S = Major

N = 10.13.2.8

EN = fltPortPIoLinkDown

ET = UCS_Etherne

ether port 10 on fabric interconnect A oper state: link-down, reason: Link failure or notconnected(FaultCode:fltPortPIoLinkDown, FaultIndex:3468975)

S = Major

N= 10.13.2.8

EN = fltEquipmentIOCardUnsupported Connectivity

ET = default

IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode: fltEquipmentIOCardUnsupportedConnectivity, FaultIndex:3695902)

S = Major

N = 10.13.2.8

EN = fltEquipmentIOCardUnsupported Connectivity

ET = default

IOM 3/1 (A) current connectivity does not match discovery policy: unsupported-connectivity (FaultCode: fltEquipmentIOCardUnsupportedConnectivity, FaultIndex:3695902)

S = Major

N= 10.13.2.8

EN = fltLsServerInaccessible

ET = default

Service profile c3b1 cannot be accessed (FaultCode:fltLsServerInaccessible,

FaultIndex:3695961)

S = Warning

C = C080

EN = RTMTDataMissing

ET = OM_CUCM_OM_Connectivity

RTMTDataMissing::Component= VECUCM- CL-C080-1; CallManagerList= 192.6.4.186,192.6.4.193,192.6.4.188,192.6.4.187; ReasonForRTMTDataMissing= Unable to communicate with RTMT

on publisher; CustomerName= C080; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;

S = Warning

C = C080

EN = RTMTDataMissing

ET = Default

RTMTDataMissing::Component= cucxn-80-pub.customer.com; Name= cucxn-80-pub.customer.com; HostDescription= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; Reason= Error collecting RTMT data. Error reason: HTTP communication error; Default Event Name= RTMTDataMissing; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=RTMTDataMissing >;


Chassis failures were observed for few instances of UCS_Chassis_Avlblty EventTypeId from fltEquipmentChassisPowerProblem event and the it is marked as `Unknown' and not as `Root Cause' or `Symptom' event.

Service Tree Event Overlay Location and Content

Table A-131 Observed Service Tree Events for CP6 

Location
Summary

...-> Cluster_Availability -->

Node:CUP-80-pub

PerformancePollingStopped::Component= cup-80-sub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cup-80- sub.customer.com,192.6.6.191,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Sub: CUCM-80-sub2

PerformancePollingStopped::Component= cucm-80-sub2.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucm-80- sub2.customer.com,192.6.6.192,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Pub: CUCM-80-pub

PerformancePollingStopped::Component= cucm-80-pub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucm-80- pub.customer.com,192.6.6.186,Cannot collect data. The device is experiencing communication problems. Device may be in partially monitored state. Check HTTP(S) credentials.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Sub: CUCxn-80-sub

PerformancePollingStopped::Component= cucxn-80-sub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucxn-80- sub.customer.com,192.6.4.190,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability-->

Pub: CUCxn-80-pub

PerformancePollingStopped::Component= cucxn-80-pub.customer.com; Error Message String= 25-Oct-2012 16:16:33 EDT,cucxn-80- pub.customer.com,192.6.4.189,Cannot collect data. The device returned no data from a required MIB.; Default Event Name= PerformancePollingStopped; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=PerformancePollingStopped >;

...-> Cluster_Availability--> Node:CUP-80-pub

Unresponsive::Component= cup-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 4 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144MB Memory: Software:UCOS 4.0.0.0-44; DiscoveredFirstAt= 10-24-2012 19:08:32; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:06:03; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCM-80-sub1

Unresponsive::Component= cucm-80-sub1.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 10-17-2012 18:49:30; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.187; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-24-2012 06:04:52; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCM-80-sub2

Unresponsive::Component= cucm-80-sub2.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 10-24-2012 23:13:45; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.192; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:02:42; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Pub: CUCM-80-pub

Unresponsive::Component= cucm-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Linux release:2.6.18-194.26.1.el5PAE machine:i686; DiscoveredFirstAt= 10-17-2012 18:49:27; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.186; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:03:58; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Sub: CUCxn-80-sub

Unresponsive::Component= cucxn-80-sub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 10-24-2012 17:05:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.4.190; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 05:06:03; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> Cluster_Availability--> Pub: CUCxn-80-pub

Unresponsive::Component= cucxn-80-pub.customer.com; SystemObjectID= .1.3.6.1.4.1.9.1.1348; Description= Hardware:VMware, 1 Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 6144 MB Memory: Software:UCOS 5.0.0.0-2; DiscoveredFirstAt= 10-24-2012 19:05:42; Type= HOST; DisplayClassName= Host; SNMPAddress= 192.6.6.188; IsManaged= true; Vendor= CISCO; DiscoveredLastAt= 10-25-2012 06:06:03; Default Event Name= Unresponsive; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=Unresponsive >;

...-> VM Availability

The virtual machine CUCxn-80-sub running on host 10.13.3.31 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.31:ESX ON 2301156 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-sub1 running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.32:ESX ON 2529034 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-sub2 running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.32:ESX ON 2529036 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUP-80-pub running on host 10.13.3.34 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.34:ESX ON 2529027 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCM-80-pub running on host 10.13.3.32 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.32:ESX ON 2529032 (Event_Type=VmDisconnectedEvent)]

...-> VM Availability

The virtual machine CUCxn-80-pub running on host 10.13.3.33 is Disconnected. Message: KVM_VM_Disconnected_Cisco_HCM [(Event_Type=N"VmDisconnectedEvent") ON VM:tb3-vcent-10.13.3.33:ESX ON 2529018 (Event_Type=VmDisconnectedEvent)]

...-> CUP-80-pub-> VoiceService

Meta event for CUP Voice Service - C080

...-> CUCxn-CL-C080-1-> VoiceService

Meta event for CUCxn Voice Service - C080


Next Steps


Step 1 Cross-launch to UCSM to confirm that the chassis is powered off.

Step 2 Power on the chassis to clear the events.


CP7 - Application Resources Degradation - CUCMIP

This use case describes the events that are generated if the threshold for available hard disk space is crossed. This type of incident generates RC and SI events.

Observed RC-EL Events

The following table shows RC-EL events observed during testing.

Table A-132 Observed RC-EL Events for CP7

Severity
EventTypeID
Summary

Minor

OM_CUP_App_Resources

Synthetic Event for OM_CUP_App_Resources group events from cup-82-pub.customer.com


Observed SI-EL Events

Table A-133 Observed SI-EL Events for CP7

Severity
Summary

Minor

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the service in the tree view portlet. Table A-134 shows service tree events observed during testing.

Table A-134 Observed Service Tree for CP7

Severity
Summary

...-> Cluster_Availability--> Pub: CUP-82-pub

InsufficientFreeHardDisk::Component= DISK-cup-82-pub.customer.com/3; HardDiskTotalSize= 19280 MB; FreeHardDiskThreshold= 15; FreeHardDiskInPercentage= 0 %; HardDiskUsed= 19268 MB; Default Event Name= InsufficientFreeHardDisk; DescriptionURL= < http://150.0.0.52:1741/ CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=InsufficientFreeHardDisk >;


Next Steps

Manually remove the dummy file from the CUCMIP server.

The events automatically clear, however some take up to 60 minutes.

Should there be a different type of OS failure, other recovery steps would be required.

CP11 - IM Resources Exceeded - CUCMIP

This use case describes the events that the Prime Central for HCS dashboard displays if a VM running CUCMIP exceeds the threshold value for the number of TextConferenceRooms opened via the Jabber client. Prime Central for HCS generates RC and SI events for such incidents.

Observed RC-EL Events

When the threshold value is reached, numerous synthetic RCA events are observed, including VC_VM_Avlblty and OM_CUP_IM_Resources.

Table A-135 Observed RC-EL Events for CP11

Severity
EventTypeID
Summary

Critical

OM_CUP_IM_Resources

Synthetic Event for OM_CUP_IM_Resources group events from cup-82-pub.customer.com


Observed SI-EL Events

Table A-136 Observed SI-EL Events for CP11

Severity
Summary

Critical

Overall Attribute of the Customer_Presence_Service_Template tag of CUP-82-pub is Marginal.


Observed Other-EL Events

None.

Service Tree Event Overlay Location and Content

SIA events are overlaid on the service in the tree view portlet. Table A-114 shows service tree events observed during testing.

Table A-137 Observed Service Tree for CP11

Severity
Summary

...-> Cluster_Availability--> Pub: CUP-82-pub

TextConferenceRoomsExceeded::Component= TextConferenceRooms-cup-82- pub.customer.com; Threshold Value= 3; Violation Value= 5; Default Event Name= TextConferenceRoomsExceeded; DescriptionURL= < http://150.0.0.52:1741/CSCOnm/servlet/ com.cisco.nm.help.ServerHelpEngine? tag=TextConferenceRoomsExceeded >;


Next Steps

Close chat rooms until the number of open chat rooms becomes less than the threshold value.