Table Of Contents
Failure, Failover, and Recovery
Cisco Unified Customer Voice Portal Post-Routed Call Flow
Test 1: Cisco Unified Customer Voice Portal Gatekeeper Failure
Failure without Failover Testing
Private Connection Between Roggers
Failure, Failover, and Recovery
The contact center environment as a whole is intended to be redundant and self-healing. In many cases, this makes the failover and recovery from a failure nearly invisible.
This subject is discussed in greater detail in the following documents:
•
Cisco ICM Software Administration Guide for Cisco ICM Enterprise Edition (Fault Tolerance chapter):
http://www.cisco.com/application/pdf/en/us/guest/products/ps1001/c1693/ccmigration_09186a00804d7126.pdf•
Cisco CallManager System Guide:
http://www.cisco.com/en/US/products/sw/voicesw/ps556/products_administration_guide_book09186a008066fa72.html•
Cisco CallManager Features and Services Guide:
http://www.cisco.com/en/US/products/sw/voicesw/ps556/products_administration_guide_book09186a008066fa74.htmlThis topic provides a description of specific test cases that were executed as a part of Cisco Unified Communications System Release 5.0(2) failover testing. This testing was done to verify the redundancy and failover capabilities of specific components such as Gatekeepers, WAN Access Routers, and the private connection between the ROGGERs (CallRouter and Logger) in the data centers.
For additional failover testing that was conducted in the contact center environment, see the Failover test cases and results at: http://www.cisco.com/iam/unified/ipcc2/System_Test_Results.htm
![]()
Note
Much of the testing was done with specific test tools, simulated phones, and simulated agent desktops. In an actual customer setting, real agents with real phones might attempt to shut down their desktops, force logins, etc. Such events are not generally factored into the testing done with simulation tools.
This topic includes the following sections:
•
Failure with Failover Testing
–
Cisco Unified Customer Voice Portal Post-Routed Call Flow
•
Failure without Failover Testing
–
Private Connection Between Roggers
Failure with Failover Testing
This section discusses the failover testing that was done with contact center components that have redundancy capabilities in the event of a failure.
Cisco Unified Customer Voice Portal Post-Routed Call Flow
Test 1: Cisco Unified Customer Voice Portal Gatekeeper Failure
Pre-Test Conditions
The following describes the test conditions for this particular test:
•
Test site involved is Site3.
•
Inbound calls at Site3 are routed from the Cisco Unified Customer Voice Portal (Unified CVP) gateway to the Unified CVP Voice Browser.
•
Two Gatekeepers (GK1 and GK2) in Site1 are deployed in an HSRP redundancy model.
•
Cisco Unified Intelligent Contact Management (Unified ICM) script is set up to first connect the call to a Type 7 VRU (VXML-enabled Gateway) and collect digits.
•
All agents in the target skill group are flagged as NOT_READY.
•
The Unified ICM script is then set up to queue the incoming call at the VRU.
•
The status for an agent in that skill group is set to become READY after the call has been queued for 30 seconds.
Test
The following describes the failover testing done for active Unified CVP Post-Routed calls:
1.
Place calls from a PSTN call generator to the Unified CVP gateway at Site3.
2.
Verify that the VRU plays the appropriate media files and provides queue treatment for the incoming calls.
3.
Disable the primary Gatekeeper (GK1) in the HSRP cluster.
4.
Place additional calls from the PSTN call generator which are now routed through the secondary Gatekeeper (GK2) in the HSRP cluster.
5.
Make agents available in Site3 for the skill groups selected by the caller and answer the calls from a Cisco Agent Desktop (CAD).
Results
The following results were verified in the above test:
•
All active calls remained active and no calls were dropped.
•
Call failures did not occur during the failover between the Unified CVP gatekeepers.
•
Annoucements (Unified IP IVR) and music were heard by the callers,
•
Pop-up errors were not displayed on the agent desktop.
•
The voice path was operational in both directions, to and from the PSTN.
Failure without Failover Testing
This section discusses the failover testing that was done with contact center components that did not have redundancy capabilities in the event of a failure.
WAN Access Router
Pre-Test Conditions
The following describes the test conditions for this particular test:
•
Test site involved is Site3.
•
All links for Site3 are up and active.
•
A WAN router is deployed at this site for communication with other sites across the Frame Relay cloud.
•
Calls at Site3 are in progress in the IOS Voice Browser and on CAD desktops with at least one Supervisor monitoring an agent call.
•
There is no backup implemented for the serial interface on the WAN router at Site3.
Test
The following describes the failover testing done for the WAN Access Router (without a backup WAN link):
1.
Disable the serial interface of the WAN router at Site3 and observe the impact to system behavior.
2.
Verify the results of the above procedure as described in After Disabling the Serial Interface.
3.
Enable the serial interface of the WAN router at Site3 and observe the impact to system behavior.
4.
Verify the results of the above procedure as described in After Enabling Serial Interface.
Results
The following results were verified in the above test:
![]()
Note
No impact is expected to Site1 and Site5 call processing or agent states. All failures occur at Site3.
After Disabling the Serial Interface
For new calls generated from the PSTN:
•
New inbound calls to the IOS Voice Browser (Gateway) can behave in one of two ways, depending upon the gateway configuration:
–
They can fail with a busy tone.
- or -
–
They can be re-routed or "hairpinned," using a Gateway Redirect Dial Peer, as new calls to the Unified CVP at another inter-cluster site.
For transient calls (already in the IOS Voice Browser and being transferred to agents):
•
If the call was re-directed by the IOS Voice Browser to an agent, the call continued the transfer and rang at the agent phone. This occurred only if the call reached the agent phone before the phone realized that it had lost the WAN connection with its Cisco Unified CallManager (Unified CallManager).
•
If the call had not yet been re-directed by the IOS Voice Browser to an agent, the call either failed or, if a backup re-direct was programmed in the gateway, the call was "hairpinned" to another Unified CVP site within the cluster.
•
If the call was in queue on the IOS Voice Browser, the call either failed, or if a backup re-direct was programmed in the gateway, the call was "hairpinned" to another Unified CVP site within the cluster.
![]()
Note
In all cases, the H.323 call survivability timer impacted the life of the call. Be aware that all calls may be removed from the ingress gateway after this timer expires.
For existing calls being handled by agents:
•
Calls stayed active and operational until the call ended normally or until the H.323 timer expired.
•
The following was the impact observed to the agent state for agents in Site3:
–
Agents lost connectivity to the CAD server at Site3 and the agent desktops were in a NOT_READY state until the WAN connection was restored.
–
Agents already engaged on calls were not be able to perform any agent state change or telephony functions, such as hold/conference/transfer, during the outage event.
–
Agents were not be able to log into the system during this event.
–
Unified ICM showed the remote agents at Site3 as being no longer available (the Webview report showed the remote agents as being logged out).
•
The following was the impact observed to Supervisor Monitoring:
–
Supervisor already in a monitor session with the agent and caller remained in the call for the duration of the call, or until the H.323 timer expired and the call was terminated by the gateway.
–
Supervisor was not be able to perform any call control functions to barge in, intercept, or perform any conference/chat functions on the Supervisor desktop.
After Enabling Serial Interface
•
Phones were reset and re-registered with their target Unified CallManager.
•
Agent desktops reconnected to the CAD Servers and, depending upon the phone state (reset or not), allowed the agents to become READY without having to log in again.
•
Agents that were not logged in were able to now log in (if the Cisco Unified IP phones were reset properly).
•
Any calls that were in the gateway prior to the failure and were `hairpinned' to another site remained in this state, until the call was terminated normally. In this situation, the gateway will not automatically terminate the calls.
Private Connection Between Roggers
Pre-Test Conditions
The following describes the test conditions for this particular test:
•
Test sites involved are Site1 and Site5.
•
Rogger A is located at Site1 and Rogger B is located in Site5.
•
All links between the two sites are up and active.
•
Calls are in progress between the Site1 and Site5 Unified CallManager clusters.
•
There is no backup implemented for the private connection between the two Roggers at Site1 and Site5.
Test
The following describes the failover testing done for the private connection between the Roggers (without a backup connection):
1.
Simulate a failure of the private link between Rogger A and Rogger B.
2.
Verify the system behavior immediately after the simulated private link failure as described in After the Private Link Failure.
3.
Place calls from a PSTN call generator and route them between Site1 and Site5.
4.
Verify the system behavior after the private link was restored as described in After the Private Link was Restored.
Results
The following results were verified in the above test:
After the Private Link Failure
•
Via the Event Viewer, both Roggers indicated a loss of heart beats on the private network after missing five consecutive 100ms heart beats.
•
The Roggers sent Test Other Side (TOS) messages to the Peripheral Gateway which responded with either Rogger A or Rogger B as the enabled side of the system.
•
Based on the Rogger that was considered the enabled side, the other Rogger became disabled.
•
The enabled Rogger then initiated the Enabled Simplex operation (visible in the MDS process window).
•
There was no impact observed to system operation or behavior.
•
There was no loss of calls or agent state across the system during this failure.
After the Private Link was Restored
•
The Roggers observed the presence of a duplex partner and performed a state transfer operation from the Active side to the Inactive side call router.
•
Upon completion of the state transfer operation, the MDS processes reported both Roggers were in an active duplex operation.
•
No impact to call processing was observed during this event window.