Guest

Support

Design Considerations for High Availability

  • Viewing Options

  • PDF (2.0 MB)
  • Feedback
Design Considerations for High Availability

Table Of Contents

Design Considerations for High Availability

What's New in This Chapter

Designing for High Availability

Data Network Design Considerations

Unified CM and CTI Manager Design Considerations

Configuring the Unified ICM Peripheral Gateway for CTI Manager Redundancy

Unified IP IVR Design Considerations

Unified IP IVR High Availability Using Unified CM

Unified IP IVR High Availability Using Unified ICM Call Flow Routing Scripts

Cisco Unified Customer Voice Portal (Unified CVP) Design Considerations

Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)

Cisco Email Manager Option

Cisco Collaboration Server Option

Cisco Multi-Channel Options with the Cisco Interaction Manager: E-Mail Interaction Manager (EIM) and Web Interaction Manager (WIM)

Cisco Interaction Manager Architecture Overview

Unified CCE Integration

High Availability Considerations for Cisco Interaction Manager

Load Balancing Considerations

Managing Failover

Cisco Unified Outbound Option Design Considerations

Peripheral Gateway Design Considerations

Multiple PIM Connections to a Single Unified CM Cluster

Improving Failover Recovery for Customers with Large Numbers of CTI Route Points

Scaling the Unified CCE PG Beyond 2,000 Agents per Server

Redundant/Duplex Unified CCE Peripheral Gateway Considerations

Unified CM JTAPI and Peripheral Gateway Failure Detection

Unified ICM Redundancy Options

Unified CM Failure Scenarios

Unified ICM Failover Scenarios

Scenario 1: Unified CM and CTI Manager Fail

Scenario 2: Agent PG Side A Fails

Scenario 3: The Unified CM Active Call Processing Subscriber Fails

Scenario 4: The Unified CM CTI Manager Providing JTAPI Services to the Unified CCE PG Fails

Unified CCE Scenarios for Clustering over the WAN

Scenario 1: Unified ICM Central Controller or Peripheral Gateway Private Network Failure

Scenario 2: Visible Network Failure

Scenario 3: Visible and Private Networks Both Fail (Dual Failure)

Scenario 4: Unified CCE Agent Site WAN (Visible Network) Failure

Understanding Failure Recovery

Unified CM Service

Unified IP IVR

Unified ICM

Unified CM PG and CTI Manager Service

Unified ICM Voice Response Unit PG

Unified ICM Call Router and Logger

Administrative Workstation Real-Time Distributor (RTD)

CTI Server

CTI OS Considerations

Cisco Agent Desktop Considerations

Design Considerations for Unified CCE System Deployment with Unified ICM Enterprise

Parent/Child Components

The Unified ICM Enterprise (Parent) Data Center

The Unified Contact Center Express (CCX) Call Center (Child) Site

The Unified CCE Call Center (Child) Site

Parent/Child Call Flows

Typical Inbound PSTN Call Flow

Post-Route Call Flow

Parent/Child Fault Tolerance

Unified CCE Child Loses WAN Connection to Unified ICM Parent

Unified Contact Center Express Child Loses WAN Connection to Unified ICM Parent

Unified CCE Gateway PG Fails or Cannot Communicate with Unified ICM Parent

Parent/Child Reporting and Configuration Impacts

Other Considerations for the Parent/Child Model

Other Considerations for High Availability


Design Considerations for High Availability


Last revised on: August 18, 2009

 

This chapter covers several possible Unified CCE failover scenarios and explains design considerations for providing high availability of system functions and features in each of those scenarios. This chapter contains the following sections:

Designing for High Availability

Data Network Design Considerations

Unified CM and CTI Manager Design Considerations

Unified IP IVR Design Considerations

Cisco Unified Customer Voice Portal (Unified CVP) Design Considerations

Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)

Cisco Email Manager Option

Cisco Collaboration Server Option

Cisco Multi-Channel Options with the Cisco Interaction Manager: E-Mail Interaction Manager (EIM) and Web Interaction Manager (WIM)

Cisco Unified Outbound Option Design Considerations

Peripheral Gateway Design Considerations

Understanding Failure Recovery

CTI OS Considerations

Cisco Agent Desktop Considerations

Design Considerations for Unified CCE System Deployment with Unified ICM Enterprise

Other Considerations for High Availability

What's New in This Chapter

Table 3-1 lists the topics that are new in this chapter or that have changed significantly from previous releases of this document.

 

Table 3-1 New or Changed Information Since the Previous Release of This Document 

New or Revised Topic
Described in:

Cisco Interaction Manager

Cisco Multi-Channel Options with the Cisco Interaction Manager: E-Mail Interaction Manager (EIM) and Web Interaction Manager (WIM)

Failover of CTI Manager or PG

Unified CM PG and CTI Manager Service

Peripheral Interface Manager (PIM)

Multiple PIM Connections to a Single Unified CM Cluster



Note Many of the design considerations and illustrations throughout this chapter have been revised and updated. Cisco recommends reviewing the entire chapter before designing a Unified CCE system.


Designing for High Availability

Cisco Unified CCE is a distributed solution that uses numerous hardware and software components, and it is important to design each system in a way that eliminates any single point of failure or that at least addresses potential failures in a way that will impact the fewest resources in the contact center. The type and number of resources impacted will depend on how stringent your requirements are, the budget for fault tolerance, and which design characteristics you choose for the various Unified CCE components, including the network infrastructure. A good Unified CCE design will be tolerant of most failures (defined later in this section), but not all failures can be made transparent.

Cisco Unified CCE is a solution designed for mission-critical contact centers. The successful design of any Unified CCE deployment requires a team with experience in data and voice internetworking, system administration, and Unified CCE application design and configuration.


Note Simplex deployments are allowed for demo, laboratory, and non-production deployments. However, all production deployments must be deployed with redundancy for the core ICM components (Routers, Loggers, PGs, and pre-routing gateways).


Before implementing Unified CCE, use careful preparation and design planning to avoid costly upgrades or maintenance later in the deployment cycle. Always design for the worst possible failure scenario, with future scalability in mind for all Unified CCE sites.

In summary, plan ahead and follow all the design guidelines and recommendations presented in this guide and in the Cisco Unified Communications Solution Reference Network Design (SRND) guide, available at

http://www.cisco.com/en/US/products/sw/voicesw/ps556/products_implementation_design_guides_list.html

For assistance in planning and designing your Unified CCE solution, consult your Cisco or certified Partner Systems Engineer (SE).

Figure 3-1 shows a high-level design for a fault-tolerant Unified CCE single-site deployment.

Figure 3-1 Unified CCE Single-Site Design for High Availability

In Figure 3-1, each component in the Unified CCE solution is duplicated with a redundant or duplex component, with the exception of the intermediate distribution frame (IDF) switch for the Unified CCE agents and their phones. The IDF switches do not interconnect with each other, but only with the main distribution frame (MDF) switches, because it is better to distribute the agents among different IDF switches for load balancing and for geographic separation (for example, different building floors or different cities). If an IDF switch fails, all calls should be routed to other available agents in a separate IDF switch or to a Unified IP IVR queue. Follow the design recommendations for a single-site deployment as documented in the Cisco Unified Communications Solution Reference Network Design (SRND) guide, available at

http://www.cisco.com/en/US/products/sw/voicesw/ps556/products_implementation_design_guides_list.html

If designed correctly for high availability and redundancy, a Unified CCE system can lose half of its core component systems or servers and still be operational. With this type of design, no matter what happens in the Unified CCE system, calls can still be handled in one of the following ways:

Routed and answered by an available Unified CCE agent using an IP phone or desktop softphone

Sent to an available Unified IP IVR or Unified CVP port or session

Answered by the Cisco Unified Communications Manager AutoAttendant or Hunt Group

Prompted by a Unified IP IVR or Unified CVP announcement that the call center is currently experiencing technical difficulties, and to call back later

Rerouted to another site with available agents or resources to handle the call

The components in Figure 3-1 can be rearranged to form two connected Unified CCE sites, as illustrated in Figure 3-2.

Figure 3-2 Unified CCE Single-Site Redundancy

Figure 3-2 emphasizes the redundancy of the single site design in Figure 3-1. Side A and Side B are basically mirror images of each other. In fact, one of the main Unified CCE features to enhance high availability is its ability to add redundant/duplex components that are designed to automatically fail-over and recover without any manual intervention. Core system components with redundant/duplex components are interconnected to provide failure detection of the redundant/duplex system component with the use of TCP keep-alive messages generated every 100 ms over a separate Private Network path. The fault-tolerant design and failure detection/recovery method is described later in this chapter.

Other components in the solution use other types of redundancy strategies. For example, Cisco Unified Communications Manager (Unified CM) uses a cluster design to provide IP phones and devices with multiple Unified CM subscribers (servers) with which to register if the primary server fails, and those devices automatically re-home to the primary when it is restored.

The following sections use Figure 3-1 as the model design to discuss issues and features that you should consider when designing Unified CCE for high availability. These sections use a bottom-up model (from a network model perspective, starting with the physical layer first) that divides the design into segments that can be deployed in separate stages.

Cisco recommends using only duplex (redundant) Unified CM, Unified IP IVR/Unified CVP, and Unified ICM configurations for all Unified CCE deployments. This chapter assumes that the Unified CCE failover feature is a critical requirement for all deployments, therefore it presents only deployments that use a redundant (duplex) configuration, with each Unified CM cluster having at least one publisher and one subscriber. Additionally, where possible, deployments should follow the best practice of having no devices, call processing, or CTI Manager Services running on the Unified CM publisher.

Data Network Design Considerations

The Unified CCE design shown in Figure 3-3 illustrates the voice call path from the PSTN (public switched telephone network) at the ingress voice gateway to the call reaching a Unified CCE agent. The network infrastructure in the design supports the Unified CCE environment for data and voice traffic. The network, including the PSTN, is the foundation for the Unified CCE solution. If the network is poorly design to handle failures, then everything in the contact center is prone to failure because all the servers and network devices depend on the network for highly available communications. Therefore, the data and voice networks must be a primary part of your solution design and must be addressed in the early stages for all Unified CCE implementations.


Note Cisco recommends that the NIC card and ethernet switch be set to 100 MB full duplex for 10/100 links, or set to auto-negotiate for gigabit links for all the Unified ICM core component servers.


In addition, the choice of voice gateways for a deployment is critical because some protocols offer more call resiliency than others. This chapter provides high-level information on how the voice gateways should be configured for high availability with the Unified CCE solution.

For more information on voice gateways and voice networks in general, refer to the Cisco Unified Communications Solution Reference Network Design (SRND) guide, available at

http://www.cisco.com/en/US/products/sw/voicesw/ps556/products_implementation_design_guides_list.html

Figure 3-3 High Availability in a Network with Two Voice Gateways and One Unified CM Cluster

Using multiple voice gateways avoids the problem of a single gateway failure causing blockage of all inbound and outgoing calls. In a configuration with two voice gateways and one Unified CM cluster, each gateway should register with a different primary Unified CM subscriber to spread the workload across the subscribers in the cluster. Each gateway should use another subscriber as a backup in case its primary fails. For details on setting up Unified CM for redundant service and redundancy groups related to call processing, refer to the Cisco Unified Communications Solution Reference Network Design (SRND) guide (available at

http://www.cisco.com/en/US/products/sw/voicesw/ps556/products_implementation_design_guides_list.html

With Cisco IOS voice gateways using H.323 or SIP, additional call processing is available by using TCL scripts and additional dial peers if the gateway is unable to reach its Unified CM for call control or call processing instructions. MGCP gateways do not have this built-in functionality, and the trunks that are terminated in these gateways should have backup routing or "roll-over service" from the PSTN carrier or service provider to reroute the trunk on failure or no-answer to another gateway or location.

As for sizing the gateway's trunk capacity, it is a good idea to account for failover of the gateways, building in enough excess capacity to handle the maximum busy hour call attempts (BHCA) if one or more voice gateways fail. During the design phase, first decide how many simultaneous voice gateway failures are possible and acceptable for the site. Based upon this requirement, the number of voice gateways used, and the distribution of trunks across those voice gateways, you can determine the total number of trunks required for normal and disaster modes of operation. The more you distribute the trunks over multiple voice gateways, the fewer trunks you will need in a failure mode. However, using more voice gateways or carrier PSTN trunks will increase the cost of the solution, so you should compare the cost with the benefits of being able to service calls in a gateway failure. The form-factor of the gateway is also a consideration; for example, if an entire 8-port T1 blade fails in a Cisco AS5400 voice gateway chassis, that event could impact 184 calls coming into the site.

As an example, assume a contact center has a maximum BHCA that results in the need for four T1 lines, and the company has a requirement for no call blockage in the event of a single component (voice gateway) failure. If two voice gateways are deployed in this case, then each voice gateway should be provisioned with four T1 lines (total of eight). If three voice gateways are deployed, then two T1 lines per voice gateway (total of six) would be enough to achieve the same level of redundancy. If five voice gateways are deployed, then one T1 per voice gateway (total of five) would be enough to achieve the same level of redundancy. Thus, you can reduce the number of T1 lines required by adding more voice gateways and spreading the risk over multiple physical devices.

The operational cost savings of fewer T1 lines might be greater than the one-time capital cost of the additional voice gateways. In addition to the recurring operational costs of the T1 lines, you should also factor in the carrier charges like the typical one-time installation cost of the T1 lines to ensure that your design accounts for the most cost-effective solution. Every installation has different availability requirements and cost metrics, but using multiple voice gateways is often more cost-effective. Therefore, it is a worthwhile design practice to perform this cost comparison.

After you have determined the number of trunks needed, the PSTN service provider has to configure them so that calls can be terminated onto trunks connected to all of the voice gateways (or at least more than one voice gateway). From the PSTN perspective, if the trunks going to the multiple voice gateways are configured as a single large trunk group, then all calls will automatically be routed to the surviving voice gateways when one voice gateway fails. If all of the trunks are not grouped into a single trunk group within the PSTN, then you must ensure that PSTN rerouting or overflow routing to the other trunk groups is configured for all dialed numbers.

If a voice gateway with a digital interface (T1 or E1) fails, then the PSTN automatically stops sending calls to that voice gateway because the carrier level signaling on the digital circuit has dropped. Loss of carrier level signaling causes the PSTN to busy-out all trunks on that digital circuit, thus preventing the PSTN from routing new calls to the failed voice gateway. When the failed voice gateway comes back on-line and the circuits are back in operation, the PSTN automatically starts delivering calls to that voice gateway again.

With Cisco IOS voice gateways using H.323 or SIP, it is possible for the voice gateway itself to be operational but for its communication paths to the Unified CM servers to be severed (for example, a failed Ethernet connection). If this situation occurs, you can use the busyout-monitor interface command to monitor the Ethernet interfaces on a voice gateway. To place a voice port into a busyout monitor state, use the busyout-monitor interface voice-port configuration command. To remove the busyout-monitor state on the voice port, use the no form of this command. As noted previously, these gateways also provide additional processing options if the call control interface is not available from Unified CM to reroute the calls to another site or dialed number or to play a locally stored .wav file to the caller and end the call.

With MGCP-controlled voice gateways, when the voice gateway interface to Unified CM fails, the gateway will look for secondary and tertiary Unified CM subscribers from the redundancy group. The MGCP gateway will automatically fail-over to the other subscribers in the group and periodically check the health of each, marking it as available once it comes back on-line. The gateway will then fail-back to the primary subscriber when all calls are idle or after 24 hours (whichever comes first). If no subscribers are available, the voice gateway automatically busies-out all its trunks. This action prevents new calls from being routed to this voice gateway from the PSTN. When the voice gateway interface to Unified CM homes to the backup subscriber, the trunks are automatically idled and the PSTN should begin routing calls to this voice gateway again (assuming the PSTN has not permanently busied-out those trunks). The design practice is to spread the gateways across the Unified CM call processing servers in the cluster to limit the risk of losing all the gateway calls in a call center if the primary subscriber that has all the gateways registered to it should fail.

Voice gateways that are used with Cisco Unified Survivable Remote Site Telephony (SRST) option for Unified CM follow a similar failover process. If the gateway is cut off from the Unified CM that is controlling it, the gateway will fail-over into SRST mode, which drops all voice calls and resets the gateway into SRST mode. Phones re-home to the local SRST gateway for call control, and calls will be processed locally and directed to local phones. While running in SRST mode, it is assumed that the agents also have no CTI connection from their desktops, so they will be seen as not ready within the Unified CCE routing application. Therefore, no calls will be sent to these agents by Unified CCE. When the data connection is re-established to the gateway at the site, the Unified CM will take control of the gateway and phones again, allowing the agents to be reconnected to the Unified CCE.

Unified CM and CTI Manager Design Considerations

Cisco Unified CM uses CTI Manager, a service that acts as an application broker and abstracts the physical binding of the application to a particular Unified CM server to handle all its CTI resources. (Refer to the Cisco Unified Communications Solution Reference Network Design (SRND) guide for further details about the architecture of the CTI Manager.) The CTI Manager and CallManager are two separate services running on a Unified CM server. Some other services running on a Unified CM server include TFTP, Cisco Messaging Interface, and Real-time Information Server (RIS) data collector services.

The main function of the CTI Manager is to accept messages from external CTI applications and send them to the appropriate resource in the Unified CM cluster. The CTI Manager uses the Cisco JTAPI link to communicate with the applications. It acts like a JTAPI messaging router. The JTAPI client library in Cisco Unified CM connects to the CTI Manager instead of connecting directly to the CallManager service. In addition, there can be multiple CTI Manager services running on different Unified CM servers in the cluster that are aware of each other (via the CallManager service, which is explained later in this section). The CTI Manager uses the same Signal Distribution Layer (SDL) signaling mechanism that the Unified CM services in the cluster use to communicate with each other. However, the CTI Manager does not directly communicate with the other CTI Managers in its cluster. (This is also explained later in detail.)

The main function of the CallManager service is to register and monitor all the Cisco Unified Communications devices. It basically acts as a switch for all the Cisco Unified Communications resources and devices in the system, while the CTI Manager service acts as a router for all the CTI application requests for the system devices. Some of the devices that can be controlled by JTAPI that register with the CallManager service include the IP phones, CTI ports, and CTI route points.

Figure 3-4 illustrates some of the functions of Unified CM and the CTI Manager.

Figure 3-4 Functions of the CallManager and the CTI Manager Services

The servers in a Unified CM cluster communicate with each other using the Signal Distribution Layer (SDL) service. SDL signaling is used only by the CallManager service to talk to the other CallManager services to make sure everything is in sync within the Unified CM cluster. The CTI Managers in the cluster are completely independent and do not establish a direct connection with each other. CTI Managers route only the external CTI application requests to the appropriate devices serviced by the local CallManager service on this subscriber. If the device is not resident on its local Unified CM subscriber, then the CallManager service forwards the application request to the appropriate Unified CM in the cluster. Figure 3-5 shows the flow of a device request to another Unified CM in the cluster.

Figure 3-5 CTI Manager Device Request to a Remote Unified CM

Although it might be tempting to register all of the Unified CCE devices to a single subscriber in the cluster and point the Peripheral Gateway (PG) to that server, this configuration would put a high load on that subscriber. If the PG were to fail in this case, the duplex PG would connect to a different subscriber, and all the CTI Manager messaging would have to be routed across the cluster to the original subscriber. It is important to distribute devices and CTI applications appropriately across all the call processing nodes in the Unified CM cluster to balance the CTI traffic and possible failover conditions.

The external CTI applications use a CTI-enabled user account in Unified CM. They log into the CTI Manager service to establish a connection and assume control of the Unified CM devices associated to this specific CTI-enabled user account, typically referred to as the JTAPI user or PG user. In addition, given that the CTI Managers are independent from each other, any CTI application can connect to any CTI Manager in the cluster to perform its requests. However, because the CTI Managers are independent, one CTI Manager cannot pass the CTI application to another CTI Manager upon failure. If the first CTI Manager fails, the external CTI application must implement the failover mechanism to connect to another CTI Manager in the cluster.

For example, the Agent PG handles failover for the CTI Manager by using its duplex servers, sides A and B, each of which is pointed to a different subscriber in the cluster, and by using the CTI Manager on those subscribers. It is important to note these connections from the PG are managed in hot standby mode, which means only one side of the PG is active at any given time and connected to the CTI Manager on the subscriber. The PG processes are designed to prevent both sides from trying to be active at the same time to reduce the impact of the CTI application on Unified CM. Additionally, both of the duplex PG servers (Side A and Side B) use the same CTI-enabled JTAPI or PG user to log into the CTI Manager applications. However, only one Unified CM PG side allows the JTAPI user to register and monitor the user devices to conserve system resources in the Unified CM cluster. The other side of the Unified CM PG stays in hot-standby mode, waiting to connect, log in, register, and be activated upon failure of the active side.

Figure 3-6 shows two external CTI applications using the CTI Manager, the Agent PG, and the Unified IP IVR. The Unified CM PG logs into the CTI Manager using the JTAPI account User 1, while the Unified IP IVR uses account User 2. Each external application uses its own specific JTAPI user account and will have different devices registered and monitored by that user. For example, the Unified CM PG (User 1) will monitor all four agent phones and the inbound CTI Route Points, while the Unified IP IVR (User 2) will monitor its CTI Ports and the CTI Route Points used for its JTAPI Triggers. Although multiple applications could monitor the same devices, this method is not recommended because it can cause race conditions between the applications trying to take control of the same physical device.

Figure 3-6 CTI Application Device Registration

Unified CM CTI applications also add to the device weights on the subscribers, adding memory objects used to monitor registered devices. These monitors are registered on the subscriber that has the connection to the external application. It is a good design practice to distribute these applications to CTI Manager registrations across multiple subscribers to avoid overloading a single subscriber with all of the monitored object tracking.

The design of Unified CM and CTI Manager should be performed as the second design stage, right after the network design stage, and deployment should occur in this same order. The reason for this order is that the Cisco Unified Communications infrastructure must be in place to dial and receive calls using its devices before you can deploy any telephony applications. Before moving to the next design stage, make sure that a PSTN phone can call an IP phone and that this same IP phone can dial out to a PSTN phone, with all the call survivability capabilities considered for treating these calls. Also keep in mind that the Unified CM cluster design is paramount to the Unified CCE system, and any server failure in a cluster will take down two services (CTI Manager and CallManager), thereby adding an extra load to the remaining servers in the cluster.

Configuring the Unified ICM Peripheral Gateway for CTI Manager Redundancy

To enable Unified CM support for CTI Manager failover in a duplex Unified ICM Peripheral Gateway model, perform the following steps:


Step 1 Create a Unified CM redundancy group, and add subscribers to the group. (Publishers and TFTP servers should not be used for call processing, device registration, or CTI Manager use.)

Step 2 Designate two CTI Managers on different subscribers to be used for each side of the duplex Peripheral Gateway (PG), one for PG Side A and one for PG Side B.

Step 3 Assign one of the CTI Managers to be the JTAPI service of the Unified CM PG Side A. (See Figure 3-7.) Note that the setup panel on the left is for Side A of the Peripheral Gateway. It points to the CCM1 subscriber and uses the PGUser CTI-enabled user account on the Unified CM cluster.

Step 4 Assign the second CTI Manager to be the JTAPI service of the Unified CM PG Side B. (See Figure 3-7.) Note that the setup panel on the right is for Side B of the Peripheral Gateway. It points to the CCM2 subscriber and uses the same PGUser CTI-enabled user account on the Unified CM cluster. Both sides of the duplex PG pair must use the same JTAPI user in order to monitor the same devices from either side of the PG pair.


Figure 3-7 Assigning CTI Managers for PG Sides A and B

Unified IP IVR Design Considerations

The JTAPI subsystem in Unified IP IVR can establish connections with two CTI Managers on different subscribers in the Unified CM cluster. This feature enables Unified CCE designs to add Unified IP IVR redundancy at the CTI Manager level such as the Unified ICM Peripheral Gateway connections. Additionally, Cisco recommends to having multiple, redundant IP-IVR servers in the design and allowing the Unified ICM call routing script to load-balance calls automatically between the available IP-IVR resources.

Figure 3-8 shows two Unified IP IVR servers configured for redundancy within one Unified CM cluster. The Unified IP IVR group should be configured so that each server is connected to a different CTI Manager service on different Unified CM subscribers in the cluster for high availability. Using the redundancy feature of the JTAPI subsystem in the Unified IP IVR server, you can implement redundancy by adding the IP addresses or host names of two Unified CMs from the cluster. Then, if one of the Unified CMs fails, the Unified IP IVR associated with that particular Unified CM will fail-over to the second Unified CM.

Figure 3-8 High Availability with Two Unified IP IVR Servers and One Unified CM Cluster

Unified IP IVR High Availability Using Unified CM

You can implement Unified IP IVR port high availability by using any of the following call-forward features in Unified CM:

Forward Busy — forwards calls to another port or route point when Unified CM detects that the port is busy. This feature can be used to forward calls to another resource when a Unified IP IVR CTI port is busy due to a Unified IP IVR application problem, such as running out of available CTI ports.

Forward No Answer — forwards calls to another port or route point when Unified CM detects that a port has not picked up a call within the timeout period set in Unified CM. This feature can be used to forward calls to another resource when a Unified IP IVR CTI port is not answering due to a Unified IP IVR application problem.

Forward on Failure — forwards calls to another port or route point when Unified CM detects a port failure caused by an application error. This feature can be used to forward calls to another resource when a Unified IP IVR CTI port is busy due to a Unified CM application error.


Note When using the call forwarding features to implement high availability of Unified IP IVR ports, avoid creating a loop in the event that all the Unified IP IVR servers are unavailable. Basically, do not establish a path back to the first CTI port that initiated the call forwarding.


Unified IP IVR High Availability Using Unified ICM Call Flow Routing Scripts

You can implement Unified IP IVR high availability through Unified ICM call flow routing scripts. You can prevent calls from queuing to an inactive Unified IP IVR by using the Unified ICM scripts to check the Unified IP IVR Peripheral Status before sending the calls to it. For example, you can program a Unified ICM script to check if the Unified IP IVR is active by using an IF node or by configuring a Translation Route to the Voice Response Unit (VRU) node (by using the consider if field) to select the Unified IP IVR with the most idle ports to distribute the calls evenly on a call-by-call basis. This method can be modified to load-balance ports across multiple Unified IP IVRs, and it can address all of the Unified IP IVRs on the cluster in the same Translation Route or Send to VRU node.

In Unified System CCE, the System PG automatically performs the Translation Route to VRU function when a routing script requests for a call to be queued or a message to be played to the caller. The System PG load-balances the call across all available IP IVRs configured in the Unified System CCE.


Note All calls at the Unified IP IVR are dropped if the Unified IP IVR server itself fails. It is important to distribute calls across multiple Unified IP IVR servers to minimize the impact of such a failure. In Unified IP IVR Release 4.0(x), there is a default script to handle cases where the Unified IP IVR loses the link to the IVR Peripheral Gateway, so that the calls are not lost.


Cisco Unified Customer Voice Portal (Unified CVP) Design Considerations

The Unified CVP can be deployed with Unified CCE as an alternative to Unified IP IVR for call treatment and queuing. Unified CVP is different from Unified IP IVR in that it does not rely on Unified CM for JTAPI call control. Unified CVP uses H.323 or SIP for call control and is used in front of Unified CM or other PBX systems as part of a hybrid Unified CCE or migration solution. (See Figure 3-9.)

Figure 3-9 High Availability with Two Unified CVP Call Control Servers Using H.323

Unified CVP uses the following system components:

Cisco Voice Gateway

The Cisco Voice Gateway is typically used to terminate TDM PSTN trunks and calls to transform them into IP-based calls on an IP network. Unified CVP uses specific Cisco IOS voice gateways that support H.323 and SIP to enable more flexible call control models outside of the Unified CM MGCP control model. H.323 and SIP protocols enable Unified CVP to integrate with multiple IP and TDM architectures for Unified CCE. Voice gateways controlled by Unified CVP also provide additional functionality using the Cisco IOS built-in Voice Extensible Markup Language (VoiceXML) Browser to provide caller treatment and call queuing on the voice gateway without having to move the call to a physical device such as the IP-IVR or a third-party IVR platform. Unified CVP can also leverage the Media Resource Control Protocol (MRCP) interface of the Cisco IOS voice gateway to add automatic speech recognition (ASR) and text-to-speech (TTS) functions on the gateway as well under Unified CVP control.

Unified CVP Call Server

The Unified CVP Call Server provides call control signaling when calls are switched between the ingress gateway and another endpoint gateway or a Unified CCE agent. It also provides the interface to the Unified ICM VRU Peripheral Gateway and translates specific Unified ICM VRU commands into VoiceXML code that is rendered on the Unified CVP Voice Gateway. The Call Server can communicate with the gateways using H.323 or SIP as part of the solution.

Unified CVP Media Server

The Unified CVP caller treatment is provided either by using ASR/TTS functions via MRCP or with predefined .wav files stored on media servers. The media servers act as web servers and serve up the .wav files to the voice browsers as part of their VoiceXML processing. Media servers can be clustered using the Cisco Content Services Switch (CSS) products, thus allowing multiple media servers to be pooled behind a single URL for access by all the voice browsers in the network.

Unified CVP VXML Application Server

Unified CVP provides a VoiceXML service creation environment using an Eclipse toolkit browser, which is hosted on the Unified CVP VXML Application Server. This server also hosts the Unified CVP VoiceXML runtime environment, where the dynamic VoiceXML applications are executed and Java and Web Services calls are processed for external systems and database access.

H.323 Gatekeepers

Gatekeepers are used with Unified CVP to register the voice browsers and associate them with specific dialed numbers. When calls come into the network, the gateway will query the gatekeeper to find out where to send the call based upon the dialed number. The gatekeeper is also aware of the state of the voice browsers and will load-balance calls across them and avoid sending calls to out-of-service voice browsers or ones that have no available sessions.

SIP Proxy Servers

SIP Proxy Servers are used with Unified CVP to select voice browsers and associate them with specific dialed numbers. When calls come into the network, the gateway will query the SIP Proxy Server to find out where to send the call based upon the dialed number.

Availability of Unified CVP can be increased by the following methods:

Adding redundant Unified CVP Call Servers under control of the Unified ICM Peripheral Gateways, thus allowing the calls to be balanced automatically across multiple Unified CVP Call Servers.

Adding TCL scripts to the Unified CVP gateway to handle conditions where the gateway cannot contact the Unified CVP Call Server to direct the call correctly.

Adding gatekeeper redundancy with HSRP or gatekeeper clustering in H.323.

Adding Cisco Content Server to load-balance .wav file requests across multiple Unified CVP Media Servers and VoiceXML URL access across multiple servers.


Note Calls in Unified CVP are not dropped if the Unified CVP Call Server or Unified CVP PG fails because they can be redirected to another Unified CVP Call Server on another Unified CVP-controlled gateway as part of the fault-tolerant design using TCL scripts (which are provided with the Unified CVP images) in the voice gateway.


For more information on these options, review the Unified CVP product documentation at

http://www.cisco.com/en/US/products/sw/custcosw/ps1006/tsd_products_support_series_home.html

Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)


Note This section does not apply to the Cisco Interaction Manager, E-Mail Interaction Manager (EIM), or Web Interaction Manager (WIM) products introduced in 2007. This section refers to the Cisco E-Mail Manager (CEM) and Cisco Collaboration Server (CCS) 5.x products only, which are no longer available for new customers.


The Unified CCE solution can be extended to support multi-channel customer contacts, with email and web contacts being routed by the Unified CCE to agents in a blended or universal queue mode. The following optional components are integrated into the Unified CCE architecture (see Figure 3-10):

Media Routing Peripheral Gateway

To route multi-channel contacts, the Cisco e-Mail Manager and Cisco Collaboration Server Media Blender communicate with the Media Routing Peripheral Gateway. The Media Routing Peripheral Gateway, like any peripheral gateway, can be deployed in a redundant or duplex manner with two servers interconnected for high availability. Typically, the Media Routing Peripheral Gateway is co-located at the Central Controller and has an IP socket connection to the multi-channel systems.

Admin Workstation ConAPI Interface

The integration of the Cisco multi-channel options allows for the Unified ICM and optional systems to share configuration information about agents and their related skill groups. The Configuration Application Programming Interface (ConAPI) runs on an Administrative Workstation and can be configured with a backup service running on another Administrative Workstation.

Agent Reporting and Management (ARM) and Task Event Services (TES) Connections

ARM and TES services provide call (ARM) and non-voice (TES) state and event notification from the Unified CCE CTI Server to the multi-channel systems. These connections provide agent information to the email and web environments as well as accepting and processing task requests from them. The connection is a TCP/IP socket that connects to the agent's associated CTI Server, which can be deployed as a redundant or duplex pair on the Agent Peripheral Gateway.

Figure 3-10 Multi-Channel System

Recommendations for high availability:

Deploy the Media Routing Peripheral Gateways in duplex pairs.

Deploy ConAPI as a redundant pair of Administrative Workstations that are not used for configuration and scripting, so that they will be less likely to be shut off or rebooted. Also consider using the HDS servers at the central sites to host this function.

Deploy the Unified CCE Agent Peripheral Gateways and CTI Servers in duplex pairs.

Cisco Email Manager Option

The Cisco Email Manager is integrated with Unified CCE to provide full email support in the multi-channel contact center with Unified CCE. It can be deployed using a single server (see Figure 3-11) for a small deployments or with multiple servers to meet larger system design requirements. The major components of Cisco Email Manager are:

Cisco Email Manager Server — The core routing and control server; it is not redundant.

Cisco Email Manager Database Server — The server that maintains the online database of all email and configuration and routing rules in the system. It can be co-resident on the Cisco Email Manager server for smaller deployments or on a dedicated server for larger systems.

Cisco Email Manager UI Server — This server allows the agent user interface (UI) components to be off-loaded from the main Cisco Email Manager server to scale for larger deployments or to support multiple United Mobile Agent (Unified MA) sites. Each remote site could have a local UI Server to reduce the data traffic from the agent browser clients to the Cisco Email Manager server. Additionally, multiple UI servers could be configured for agents to have a redundant/secondary path to access the email application. (See Figure 3-12.)

Figure 3-11 Single Cisco Email Manager Server

Figure 3-12 Multiple UI Servers

Cisco Collaboration Server Option

The Cisco Collaboration Server is integrated with Unified CCE to provide web chat and co-browsing support in the multi-channel contact center with Unified CCE. The major components of the Cisco Collaboration Server are (see Figure 3-13):

Cisco Collaboration Server — Collaboration servers are deployed outside the corporate firewall in a demilitarized zone (DMZ) with the corporate web servers they support. The Collaboration Server typically supports up to 400 concurrent sessions, but multiple servers can be deployed to handle larger contact volume or to provide a backup collaboration server for agents to access if their primary server fails.

Cisco Collaboration Server Database Server — This server maintains the online database of all chat and browsing sessions as well as configuration and routing rules in the system. It can be co-resident on the Cisco Collaboration Server; however, because the Cisco Collaboration Server is outside the firewall, most enterprises deploy it on a separate server inside the firewall to protect the historical data in the database. Multiple Cisco Collaboration Servers can point to the same database server to reduce the total number of servers required for the solution. For redundancy, each collaboration server could also have its own dedicated database server.

Cisco Collaboration Server Media Blender — This server polls the collaboration servers to check for new requests, and it manages the Media Routing and CTI/Task interfaces to connect the agent and caller. Each Unified CCE Agent Peripheral Gateway will have its own Media Blender, and each Media Blender will have a Media Routing peripheral interface manager (PIM) component on the Media Routing Peripheral Gateway.

Cisco Collaboration Dynamic Content Adaptor (DCA) — This server is deployed in the DMZ with the collaboration server, and it allows the system to share content that is generated dynamically by programs on the web site (as opposed to static HTTP pages). Multiple DCA servers can be configured and called from the Collaboration Server(s) for redundancy as well.

Figure 3-13 Cisco Collaboration Server

Cisco Multi-Channel Options with the Cisco Interaction Manager: E-Mail Interaction Manager (EIM) and Web Interaction Manager (WIM)

In 2007, Cisco introduced the replacement for the 5.x versions of the Multi-Channel products: Cisco E-Mail Manager (CEM) and Cisco Collaboration Server (CCS). These original products were two separate products that had their own integration methods and web interface for the agents and administrators. The new Cisco Interaction Manager (CIM) platform is a single application that provides both E-Mail and Web interaction management using a common set of web servers and pages for agents and administrators. The new offering is designed for integration with the Unified CCE platform to provide universal queuing of contacts to agents from different media channels.

For additional design information about the Interaction Manager platform, refer to the Cisco Unified Web and E-Mail Interaction Manager Solution Reference Network Design (SRND) Guide for Unified Contact Center Enterprise, Hosted, and ICM, available at

http://www.cisco.com/en/US/products/ps7236/products_implementation_design_guides_list.html


Note The Cisco Interaction Manager (EIM/WIM) 4.2(1) release is not supported with Unified ICM or Unified Contact Center Enterprise 7.5(1). For more information, refer to the compatibility matrix at http://www.cisco.com/en/US/docs/voice_ip_comm/cust_contact/contact_center/icm_enterprise/compatibilty_matrix/guide/ipcc75compat.pdf.


Cisco Interaction Manager Architecture Overview

The Cisco Interaction Manager has several core components, as illustrated in Figure 3-14.

Figure 3-14 Cisco Interaction Manager Architecture

The architecture is defined by a multi-tiered model, with various components at each of the following levels of the design:

External Clients

Cisco Interaction Manager is a 100% web-based product that agents and end-customers can access using a web browser from their respective desktops.

Agents can access the application using Microsoft Internet Explorer 6.0 or the embedded CAD browser, and customers can access the chat customer console using specific versions of Microsoft IE, Mozilla, Firefox, or Netscape. Cisco Interaction Manager is not supported on agent desktops running in a Citrix terminal services environment.

Tier 0: Firewall and Load Balancer

Agents and customers connect to the application from their respective browsers through a firewall, if so configured for the application.

A load balancer may also be used in case of a distributed installation of the application, so that requests from agents and customers are routed to the least-loaded web servers.

Tier 1: Web Server

The web server is used to serve static content to the browser. Cisco Interaction Manager is designed to be indifferent to the specific type of web server being used, with the single requirement being that the application server vendor must provide a web server plug-in for the corresponding application server.

Tier 2: Application and File Server

The application server is used as a web container (also known as the JSP/Servlet engine) and EJB Container. The core business logic resides in the Business Object Layer, as well as stored procedures residing on the database server. The business logic residing in JAVA classes is deployed on the application server. The JSP/Servlets interact with the business objects through the business client layer, and these in turn interact with the database to execute some business logic on data present in the database server.

Example: Outbound Task Creation

User logs in to the application and creates an outbound task.

The JSP layer calls Business Client layer, which interacts with Business Objects residing in the same application server where JSPs/Servlets are deployed.

The Business Objects execute queries/stored procedure residing on the database server.

Activities are created and stored in database tables.

The file server is used for storing all email and article attachment files, report templates and all locale-specific strings used in the application.

Tier 3: Services Server

Cisco Interaction Manager has processes that perform specific business functions, such as fetching emails from a POP server, sending emails to an SMTP server, processing workflows, assigning chats to the agents, and so forth. All services run on the Services server and are managed by the Distributed Service Manager (DSM).

Cisco Interaction Manager facilitates the creation of multiple instances of services with work distributed among the various instances. For example, the service used to retrieve emails could be configured to have multiple instances to retrieve emails from different email addresses. This capability can be used to process increasing volumes of customer interactions coming into a contact center.

Data Tier: Database Server

The data tier includes databases that are SQL-compliant, HTML/XML data-sources, and ultimately Web services that consume and produce SOAP messages. Business objects and data adapters use this layer to extract data from various third-party applications and data sources. This layer also deals with HTML and XML parsing using relevant J2EE-compliant packages to process data in other formats.

Unified CCE Integration

As part of the system integration with Unified CCE, the services server consists of two additional services: namely the EAAS and the Listener Service, which interact with the Media Routing (MR) PG and Agent PG components of Unified CCE respectively via the Media Routing (MR) and Agent Resource Management (ARM) interfaces.

Additionally, the application server of Cisco Interaction Manager establishes a connection with the Unified CCE Administration Workstation (AW) database server to import relevant configuration data and to map the configuration to Cisco Interaction Manager objects in the Cisco Interaction Manager database. Note that Cisco Interaction Manager does not make use of the Configuration API (ConAPI) interface.

When Cisco Interaction Manager is integrated with Unified System CCE, the multi-channel controller of Unified System CCE is installed on the services server. Additionally, for certain deployments of Unified CCE, the Media Routing (MR) PG of Unified CCE can reside on the services server.

In parent/child configurations, there is no multi-channel routing and integration through the parent ICM. Media Routing PGs need to connect to the child or Unified System CCE. A separate Cisco Interaction Manager or partition is required for each child.

Likewise, in hosted ICM/CCH environments, there is no multi-channel routing through the Network Application Manager (NAM) layer, and integration is at the individual Customer ICM (CICM) level only. The Media Routing (MR) PGs need to connect to the CICM.

High Availability Considerations for Cisco Interaction Manager

The Cisco Interaction Manager offers high availability options using additional web and application servers and using load balancing equipment to distribute agents and contact work more evenly across the platform as well as to provide for failover in redundancy models.

Load Balancing Considerations

The web service component of a Cisco Interaction Manager deployment can be load balanced to serve a large number of agents accessing the application at the same time. The web (or Web/Application) servers can be configured behind the load balancer with Virtual IP, and an agent can access Cisco Interaction Manager through Virtual IP. Depending on the selected load balancing algorithm, the load balancer will send a request to one of the web/application servers behind it and send a response back to the agent. In this way, from a security perspective, the load balancer serves as a reverse proxy server too.

One of the most essential parameters for configuring a load balancer is to configure it to support sticky sessions with cookie-based persistence. After every scheduled maintenance task, before access is opened for users, Cisco recommends verifying that all web/application servers are available to share the load. In absence of this, the first web/application server could be overloaded due to the sticky connection feature. With other configurable parameters, you can define a load-balancing algorithm to meet various objectives such as equal load balancing, isolation of the primary web/application server, or sending fewer requests to a low-powered web/application server.

The load balancer monitors the health of all web/application servers in the cluster. If a problem is observed, the load balancer removes the given web/application server from the available pool of servers, thus preventing new web requests from being directed to the problematic web/application server.

Managing Failover

Cisco Interaction Manager supports clustered deployments. This ensures high availability and performance via transparent replication, load balancing, and failover. The following key methods are available for handling failure conditions within a Cisco Interaction Manager and Unified CCE integrated deployment:

Implementing multiple Web/App servers. If the primary server goes down, the load balancer can help handle the failure through routing requests to alternate Web/App servers. The load balancer detects application server failure and redirects requests to another application server, after which a new user session will be created and users will have to login in again to the Cisco Interaction Manager.

Allowing servers to be dynamically added or removed from the online cluster to accommodate external changes in demand or internal changes in infrastructure.

Allowing Cisco Interaction Manager services to fail-over with duplexed Unified CCE components (for example, MR PIM and Agent PIM of the MR PG and Agent PG, respectively) to eliminate downtime of the application in failure circumstances.

The single points of failure in Cisco Interaction Manager include the following.

The primary Web/App server of Cisco Interaction Manager going down (This is the centralized server for JMS message exchange.)

The Services server going down

The Database server going down

Cisco Unified Outbound Option Design Considerations

The Cisco Unified Outbound Option provides the ability for Unified CCE to place calls on behalf of agents to customers based upon a predefined campaign. The major components of the Unified Outbound Option are (see Figure 3-15):

Outbound Option Campaign Manager — A software module that manages the dialing lists and rules associated with the calls to be placed. This software is loaded on the Logger Side A platform and is not redundant; it can be loaded and active on only Logger A of the duplex pair of Loggers in the Unified CCE system.

Outbound Option Dialer — A software module that performs the dialing tasks on behalf of the Campaign Manager. In Unified CCE, the Outbound Option Dialer emulates a set of IP phones for Unified CM to make the outbound calls, and it detects the called party and manages the interaction tasks with the CTI OS server to transfer the call to an agent. It also interfaces with the Media Routing Peripheral Gateway, and each Dialer has its own peripheral interface manager (PIM) on the Media Routing Peripheral Gateway.

Media Routing Peripheral Gateway — A software component that is designed to accept route requests from "non-inbound voice" systems such as the Unified Outbound Option or the Multi-Channel products. In the Unified Outbound Option solution, each Dialer communicates with its own peripheral interface manager (PIM) on the Media Routing Peripheral Gateway.

Figure 3-15 Unified CCE Unified Outbound Option

The system can support multiple dialers across the enterprise, all of which are under control of the central Campaign Manager software. Although they do not function as a redundant or duplex pair the way a Peripheral Gateway does, with a pair of dialers under control of the Campaign Manager, a failure of one of the dialers can be handled automatically and calls will continue to be placed and processed by the surviving dialer. Any calls that were already connected to agents would remain connected and would experience no impact from the failure.

In all deployments, the Dialers are co-resident on the Unified CCE Peripheral Gateway for Unified CM. In Unified System CCE 7.5(x), the Outbound Controller can be installed on the Agent Controller as well to reduce the number of servers required in the System deployment model.

Recommendations for high availability:

Deploy the Media Routing Peripheral Gateways in duplex pairs.

Deploy multiple Dialers with one per side of the Duplex Unified CCE Peripheral Gateway, and make use of them in the Campaign Manager to allow for automatic fault recovery to a second Dialer in the event of a failure. There are two options with multiple Dialers: a second Dialer can be configured with the same number of ports (100% redundancy), or the ports can be split across the two Dialers since they operate independently and would both be active at the same time. In designs with a small number of Dialer ports, splitting them can impact the performance of the campaign.

Include Dialer phones (virtual phones in Unified CM) in redundancy groups in Unified CM to allow them to fail-over to a different subscriber, as would any other phone or device in the Unified CM cluster.

Deploy redundant voice gateways for outbound dialing to ensure that the dialers have enough available trunks to place calls in the event of a voice gateway failure. In some instances where outbound is the primary application, these gateways would be dedicated to outbound calling only.

Peripheral Gateway Design Considerations

The Agent PG uses the Unified CM CTI Manager process to communicate with the Unified CM cluster, with a single Peripheral Interface Manager (PIM) controlling agent phones and CTI route points anywhere in the cluster. The Peripheral Gateway PIM process registers with CTI Manager on one of the Unified CM servers in the cluster, and the CTI Manager accepts all JTAPI requests from the PG for the cluster. If the phone, route point, or other device being controlled by the PG is not registered to that specific Unified CM server in the cluster, the CTI Manager forwards that request via Unified CM SDL links to the other Unified CM servers in the cluster. There is no need for a PG to connect to multiple Unified CM servers in a cluster.

Multiple PIM Connections to a Single Unified CM Cluster

Although the Agent PG in this document is described as typically having only one PIM process that connects to the Unified CM cluster, the Agent PG can manage multiple PIM interfaces to the same Unified CM cluster, which can be used to create additional peripherals within Unified CCE for two purposes:

Improving Failover Recovery for Customers with Large Numbers of CTI Route Points

Scaling the Unified CCE PG Beyond 2,000 Agents per Server

Improving Failover Recovery for Customers with Large Numbers of CTI Route Points

When a Unified CCE PG fails-over, the PIM connection that was previously controlling the Unified CM cluster is disconnected from its CTI Manager, and the duplex or redundant side of the PG will attempt to connect it's PIM to the cluster using a different CTI Manager and Subscriber. This process requires the new PIM connection to register for all of the devices (phones, CTI Route Points, CTI Ports, and so forth) that are controlled by Unified CCE on the cluster. When the PIM makes these registration requests, all of them must be confirmed by the Unified CM before the PIM can go into an active state and process calls.

To help recover more quickly, the Unified CCE PG can have a PIM created that is dedicated to the CTI Route Points for the customer, thus allowing this PIM to register for these devices at a rate of approximately five per second and allowing the PIM to activate and respond to calls hitting these CTI Route points faster than if the PIM had to wait for all of the route points, then all the agent phones, and all the CTI ports. This dedicated CTI Route Point PIM could become active several minutes sooner and be able to respond to new inbound calls, directing them to queuing or treatment resources while waiting for the Agent PIM with the phones and CTI Ports to complete the registration process and become active.

This does not provide any additional scaling or other benefits for the design; the only purpose is to allow Unified CM to have the calls on the CTI Route Points serviced faster by this dedicated PIM. It should be used only with customers who have more than 250 Route Points because anything less does not provide a reasonable improvement in recovery time. Additionally, only the CTI Route Points that would be serviced by Unified CCE should be associated with this PIM, and it should have its own dedicated CTI-Enabled JTAPI or PGUser specific to the CTI Route Point PIM.


Note This configuration is not supported in the Unified System CCE model.


Scaling the Unified CCE PG Beyond 2,000 Agents per Server

In Unified CCE 7.5(x), a new feature has been enabled to allow multiple PIMs in the same physical PG server to be used to connect either to the same Unified CM cluster or to a second Unified CM cluster. This design reduces the physical number of PG servers required in the Unified CCE design. This is different from the recovery strategy for multiple PIMs because both of these PIMs would be configured with up to 2,000 concurrent agents and their related CTI Route Points and CTI Ports as needed to support those agents. The additional PIM will create another Peripheral from the ICM's perspective, which might impact routing and reporting. Additionally, agent teams and supervisors cannot cross peripherals, so careful consideration must be given to which agent groups are allocated to each PIM/Peripheral in such a design.

In designs where Unified CCE is deployed with Unified CVP, the Cisco Unified Communications Sizing Tool might show that the Unified CM cluster can support more than 2,000 total agents; however, the CTI Manager and JTAPI interfaces are tested and supported with a maximum of only 2,000 agents. In order to allow for a design that could have a single Unified CM cluster with more than 2,000 agents, a second Agent PIM can be configured to support the additional agents (up to a total of 4,000 agents per PG).


Note This configuration is not supported in the Unified System CCE model.


Figure 3-16 illustrates a single Unified CCE PG with two different PIMs pointing to the same Unified CM cluster.

Figure 3-16 Two PIMs Configured to the Same Unified CM Cluster


Note In order to size the Unified CM cluster properly for Unified CCE, you must use the Cisco Unified Communications Sizing Tool (Unified CST).


Redundant/Duplex Unified CCE Peripheral Gateway Considerations

Unified CCE Agent PGs are deployed in a redundant/duplex configuration because the PG has only one connection to the Unified CM cluster using a single CTI Manager. If that CTI Manager were to fail, the PG would no long be able to communicate with the Unified CM cluster. Adding a redundant or duplex PG allows the Unified ICM to have a second pathway or connection to the Unified CM cluster using a second CTI Manager process on a different Unified CM server in the cluster.

The minimum requirement for Unified ICM high-availability support for CTI Manager and Unified IP IVR is a duplex (redundant) Agent PG environment with one Unified CM cluster containing at least two subscribers. Therefore, the minimum configuration for a Unified CM cluster in this case is one publisher and two subscribers. This minimum configuration ensures that, if the primary subscriber fails, the devices will re-home to the secondary subscriber and not to the publisher for the cluster. (See Figure 3-17.) In smaller systems and labs, Cisco permits a single publisher and single subscriber, which means if the subscriber fails, then all the devices will be active on the publisher. For specific details about the number of recommended Unified CM servers, see Sizing Cisco Unified Communications Manager Servers, page 11-1.

Figure 3-17 Unified ICM High Availability with One Unified CM Cluster

To simplify the illustration in Figure 3-17, the ICM Server or ICM Central Controller is represented as a single server, but it is actually a set of servers sized according to the Unified CCE agent count and call volume. The ICM Central Controllers include the following redundant/duplex servers:

Call Router — The "brain" of the ICM complex that provides intelligent call routing instructions based on real-time conditions it maintains in memory across both the A-Side and B-Side Call Router processes.

Logger/Database Server — The repository for all configuration and scripting information as well as historical data collected by the system. The Loggers are "paired" with their Call Routers such that Call Router Side A will read and write data only to the Logger A, and the Call Router B will read and write only to the Logger B. Because both sides of the Call Router processes are synchronized, the data written to both Loggers is identical.

In specific deployment models, these two components can be installed on the same physical server, which is referred to as a Rogger, or combined Router/Logger. Refer to the chapter on Sizing Unified CCE Components and Servers, page 10-1, for more details on these specific configurations.

Unified CM JTAPI and Peripheral Gateway Failure Detection

There is a heartbeat mechanism that is used to detect failures between the Unified CM JTAPI link and the Peripheral Gateway. However, unlike the ICM heartbeat methods that use TCP keep-alive messages on the open socket ports, this method uses a specific heartbeat message in the JTAPI messaging protocol between the systems. By default, the heartbeat messages are sent every 30 seconds, and the communications path is reset by the Unified CM or Peripheral Gateway after missing two consecutive heartbeat messages.

This failure detection can be enhanced by using the following procedure to change the heartbeat interval on the JTAPI Gateway client that runs on the Peripheral Gateway:


Step 1 From the Start Menu of the Peripheral Gateway, Select Programs -> Cisco JTAPI -> JTAPI Preferences.

Step 2 Set the Advanced -> Server Heartbeat Interval (sec) field to 5 seconds.


Cisco recommends that you do not set this value lower than five seconds because it might impact system performance and trigger an inappropriate failover. This setting determines how often the heartbeats are generated. If it is set to five seconds, the system will fail-over this connection within ten seconds of a loss of network connection because it must detect two consecutive missed heartbeats. The default of 30 seconds takes up to one minute (60 seconds) to take action on a network connection failure.

Because this JTAPI connection between the Peripheral Gateway and Unified CM is supported only locally on the same LAN segment, there should not be an issue with latency for this heartbeat value. However, if there are any additional network hops, firewalls, or other devices that cause delay between these two components, then the heartbeat interval value should be set accordingly to account for this possible condition.

Unified ICM Redundancy Options

Duplex/Redundant Unified ICM servers can be located at the same physical site or can be geographically distributed. This applies specifically to the Central Controller (Call Router/Logger) and Peripheral Gateways.

Under normal operations, the Unified ICM Call Router and Logger/Database Server processes are interconnected through a Private Network connection that is isolated from the Visible/Public Network segment. These servers should be configured with a second NIC card for the Private Network connection, and the Private connections should be isolated from the rest of the Visible/Public Network in their own Cisco Catalyst switch if they are located at the same physical site. If the Central Controllers are geographically separated (located at two different physical sites), under normal operations the same Private Network connections must continue to be isolated and connected between the two physical sites with a separate WAN connection. For normal operations, this Private Network connection should not be provisioned on the same circuits or network gear as the Visible/Public Network WAN connection because that would create a single point of failure that could disable both WAN segments at the same time.

The Unified ICM Peripheral Gateway duplex pair of servers is also interconnected through a Private Network connection that is isolated from the Visible/Public Network segment under normal operations. If the two sides of the duplex pair (Side A and Side B) are both at the same physical site, the Private Network can be created by using an Ethernet Cross-Over Cable between the two servers to interconnect their Private Network NIC cards. If the two servers in the duplex pair are geographically distributed (located at two different physical sites), the Private Network connections must be connected with a separate WAN connection between the two physical sites. This Private Network connection should not be provisioned on the same circuits or network gear as the Visible/Public Network WAN connection because that would create a single point of failure that could disable both WAN segments at the same time.

For additional details on the ICM network requirements for this connection, refer to the Unified ICM Installation Guide, available at

http://www.cisco.com/en/US/products/sw/custcosw/ps1001/prod_installation_guides_list.html

For additional details on the Unified ICM network requirements for clustered over the WAN, see the section on IPT: Clustering Over the WAN, page 2-32.

Within the Agent PG, two software processes are run to manage the connectivity to the Unified CM cluster:

JTAPI Gateway

The JTAPI Gateway is installed on the PG by downloading it from the Unified CM cluster at the time of the PG installation. This ensures compatibility with the JTAPI and CTI Manager versions in the system. Note that, when either the PG or Unified CM is upgraded, this JTAPI Gateway component must be removed and re-installed on the PG.

The JTAPI Gateway is started by the PG automatically and runs as a node-managed process, which means that the PG will monitor this process and automatically restart it if it should fail for any reason. The JTAPI Gateway handles the low-level JTAPI socket connection protocol and messaging between the PIM and the Unified CM CTI Manager.

Agent PG Peripheral Interface Manager (PIM)

The PIM is also a node-managed process and is monitored for unexpected failures and automatically restarted. This process manages the higher-level interface between the Unified ICM and the JTAPI Gateway and Unified CM cluster, requesting specific objects to monitor and handling route requests from the Unified CM cluster.

In a duplex Agent PG environment, both JTAPI services from both Agent PG sides log into the CTI Manager upon initialization. Unified CM PG side A logs into the primary CTI Manager, while PG side B logs into the secondary CTI Manager. However, only the active side of the Unified CM PG registers monitors for phones and CTI route points. The duplex Agent PG pair works in hot-standby mode, with only the active PG side PIM communicating with the Unified CM cluster. The standby side logs into the secondary CTI Manager only to initialize the interface and make it available for a failover. The registration and initialization services of the Unified CM devices take a significant amount of time, and having the CTI Manager available significantly decreases the time for failover.

In duplex PG operation, the side that goes active is the PG side that is first able to connect to the Unified ICM Call Router Server and request configuration information. It is not deterministic based upon the side-A or side-B designation of the PG device, but it depends only upon the ability of the PG to connect to the Call Router, and it ensures that only the PG side that has the best connection to the Call Router will attempt to go active.

The startup process of the PIM requires that all of the CTI route points be registered first, which is done at a rate of 5 route points per second. For systems with a lot of CTI route points (for example, 1000), this process can take as long as 3 minutes to complete before the system will allow any of the agents to log in. This time can be reduced by distributing the devices over multiple PIM interfaces to the Unified CM cluster, as noted above.

In the event that calls arrive at the CTI Route Points in Unified CM but the PIM is not yet fully operational, these calls will fail unless these route points are configured with a recovery number in their "Call Forward on Unregistered" or "Call Forward on Failure" setting. These recovery numbers could be the Cisco Unity voicemail system for the Auto Attendant, or perhaps the company operator position, to ensure that the incoming calls are being answered.

Unified CM Failure Scenarios

A fully redundant Unified CCE system contains no single points of failure. However, there are scenarios where a combination of multiple failures can reduce Unified CCE system functionality and availability. Also, if a component of the Unified CCE solution does not itself support redundancy and failover, existing calls on that component will be dropped. The following failure scenarios have the most impact on high availability, and Unified CM Peripheral Interface Managers (PIMs) cannot activate if either of the following failure scenarios occurs (see Figure 3-18):

Agent PG/PIM side A and the secondary CTI Manager that services the PG/PIM on side B both fail.

Agent PG/PIM side B and the primary CTI Manager that services the PG/PIM on side A both fail.

In either of these cases, the Unified ICM will not be able to communicate with the Unified CM cluster.

Figure 3-18 Unified CM PGs Cannot Cross-Connect to Backup CTI Managers

Unified ICM Failover Scenarios

This section describes how redundancy works in the following failure scenarios:

Scenario 1: Unified CM and CTI Manager Fail

Scenario 2: Agent PG Side A Fails

Scenario 3: The Unified CM Active Call Processing Subscriber Fails

Scenario 4: The Unified CM CTI Manager Providing JTAPI Services to the Unified CCE PG Fails

Scenario 1: Unified CM and CTI Manager Fail

Figure 3-19 shows a complete system failure or loss of network connectivity on Cisco Unified CM subscriber A. The CTI Manager and Cisco CallManager services were initially both active on this same server, and Unified CM subscriber A is the primary CTI Manager in this case. The following conditions apply to this scenario:

All phones and gateways are registered with Unified CM subscriber A as the primary server.

All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server as part of the redundancy group in Unified CM).

Unified CM subscribers A and B are each running a separate instance of CTI Manager within the same Unified CM cluster.

When Unified CM subscriber A or its CCM.exe process fails, all registered phones and gateways re-home to Unified CM subscriber B. Calls that are in progress with agent phones will remain active, but the agents will not be able to use phone services such as conference or transfer until they hang up the call and their phone re-registers with the backup subscriber. Although the call stays active, Unified CCE loses visibility to the call and will write a Termination Call Detail (TCD) record to the Unified ICM database for the call at the time of the failure, and no additional call data such as wrap-up codes will be written about the call after that point. Phones that are not active on a call will re-home automatically.

PG side A detects a failure and induces a failover to PG side B.

Depending on the configuration of the Peripheral in Unified ICM, the CTI OS or CAD server will keep the agent logged in but "gray out" their desktop controls until the PG has completed its failover processing. The agents might not have to log in again but might have to manually make themselves "ready" or "available" to ensure they are aware the call processing functionality has been restored.

PG side B becomes active and registers all dialed numbers and phones, and call processing continues.

As noted above, when the PG fails-over, the ICM Call Router will write a Termination Call Detail Record (TCD) in the ICM database for any active calls. If the call is still active when the PG fails-over to the other side, a second TCD record will be written for this call as if it were a "new" call in the system and not connected to the prior call that was recorded in the database.

When Unified CM subscriber A recovers, all idle phones and gateways re-home to it. Active devices wait until they are idle before re-homing to the primary subscriber.

PG side B remains active, using the CTI Manager on Unified CM subscriber B.

After recovery from the failure, the PG does not fail back to the A side of the duplex pair. All CTI messaging is handled using the CTI Manager on Unified CM subscriber B, which communicates with Unified CM subscriber A to obtain phone state and call information.

Figure 3-19 Scenario 1 - Unified CM and CTI Manager Fail

Scenario 2: Agent PG Side A Fails

Figure 3-20 shows a failure on PG side A and a failover to PG side B. All CTI Manager and Unified CM services continue running normally. The following conditions apply to this scenario:

All phones and gateways are registered with Unified CM subscriber A.

All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server); however, they do not need to re-home as the primary subscriber continues to be functional.

Unified CM subscribers A and B are each running a local instance of CTI Manager.

When PG side A fails, PG side B becomes active.

PG side B registers all dialed numbers and phones, and call processing continues. Phones and gateways stay registered and operational with Unified CM subscriber A; they do not fail-over.

Agents with calls in progress will stay in progress, but with no third-party call control (conference, transfer, and so forth) available from their agent desktop softphones. Agents that were not on calls may notice their CTI desktop disable their agent state or third-party call control buttons on the desktop during the failover to the B-Side PG. Once the failover is complete, the agent desktop buttons are restored. However, the barge-in and conference calls will not be rebuilt properly, and calls will disappear from the desktop when either of the participants drops out of the call.

When the PG fails-over, the ICM Call Router will write a Termination Call Detail Record (TCD) in the ICM database for any active calls. If the call is still active when the PG fails-over to the other side, a second TCD record will be written for this call as if it were a "new" call in the system and not connected to the prior call that was recorded in the database.

When PG side A recovers, PG side B remains active and uses the CTI Manager on Unified CM subscriber B. The PG will not fail-back to the A-Side, and call processing will continue on the PG Side B.

Figure 3-20 Scenario 2 - Agent PG Side A Fails

Scenario 3: The Unified CM Active Call Processing Subscriber Fails

Figure 3-21 shows a failure on Unified CM active call processing subscriber A. In this model, the subscriber is actively processing calls and controlling devices but does not provide the CTI Manager connection to the Unified CCE PG. The CTI Manager services are running on all the Unified CM subscribers in the cluster, but only the subscribers C and D are configured to communicate with the Unified CCE Peripheral Gateway.

The following conditions apply to this scenario:

All phones and gateways are registered with Unified CM subscriber A.

All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server).

Unified CM subscribers C and D are each running a local instance of CTI Manager to provide JTAPI services for the Unified CCE PGs.

If Unified CM subscriber A fails, phones and gateways re-home to the backup Unified CM subscriber B.

PG side A remains connected and active, with a CTI Manager connection on Unified CM subscriber C. It does not fail-over because the JTAPI-to-CTI Manager connection has not failed. However, it will see the phones and devices being unregistered from Unified CM subscriber A (where they were registered) and will then be notified of these devices being re-registered on Unified CM subscriber B automatically. During the time that the agent phones are not registered, the PG will disable the agent CTI desktops to prevent the agents from attempting to use the system while their phones are not actively registered with a Unified CM subscriber. Also, they will be put into "not ready" state by the system during this transition to avoid routing calls to them as well.

Call processing continues for any devices not registered to Unified CM subscriber A. Call processing also continues for those devices on subscriber A when they are re-registered with their backup subscriber.

Calls in progress on phones registered to Unified CM subscriber A will continue; however, the agent desktop will be disabled to prevent any conference, transfer, or other third-party call control during the failover. After the agent disconnects the active call, that agent's phone will re-register with the backup subscriber.

As noted above, when the Unified CM subscriber A fails, the calls in progress stay active; however, the ICM loses control and track of those calls because the phone has not re-homed (re-registered) with the backup subscriber in the cluster. In fact, the phone will not re-home until after the current call is completed. The ICM Call Router will write a Termination Call Detail Record (TCD) in the ICM database for calls that were active at the time of the subscriber failure, with call statistics up to the time of the failure and loss of control. Any additional call information (statistics, call wrap-up data, and so forth) will not be written to the ICM database.

When Unified CM subscriber A recovers, phones and gateways re-home to it. This re-homing can be set up on Unified CM to gracefully return groups of phones and devices over time or to require manual intervention during a maintenance window to minimize the impact to the call center. During this re-homing process, the CTI Manager service will notify the Unified CCE Peripheral Gateway of the phones being unregistered from the backup Unified CM subscriber B and re-registered with the original Unified CM subscriber A.

Call processing continues normally after the phones and devices have returned to their original subscriber.

Figure 3-21 Scenario 3 - Only the Primary Unified CM Subscriber Fails

Scenario 4: The Unified CM CTI Manager Providing JTAPI Services to the Unified CCE PG Fails

Figure 3-22 shows a CTI Manager service failure on Unified CM subscriber C that is used to communicate with the Unified CCE PG. The CTI Manager services are running on all the Unified CM subscribers in the cluster, but only subscribers C and D are configured to connect to the Unified CCE PGs. During this failure, the PG will detect the loss of the JTAPI connection and fail-over to the redundant/duplex PG side.

The following conditions apply to this scenario:

All phones and gateways are registered with Unified CM subscriber A.

All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server). In this case they will not re-home because subscriber A is still functional.

Unified CM subscribers C and D are each running a local instance of CTI Manager and are designed to connect to the Unified CCE PGs.

If the Unified CM CTI Manager service on subscriber C fails, the PG side A detects a failure of the CTI Manager service and induces a failover to PG side B.

PG side B registers all dialed numbers and phones with the Unified CM CTI Manager service on subscriber D, and call processing continues.

Agents with calls in progress will stay in progress, but with no third-party call control (conference, transfer, and so forth) available from their agent desktop softphones. After an agent disconnects from all calls, that agent's desktop functionality is restored. Although the call stays active, Unified CCE loses visibility to the call and will write a Termination Call Detail (TCD) record to the ICM database for the call at the time of the failure, and no additional call data such as wrap-up codes will be written about the call after that point.

When the Unified CM CTI Manager service on subscriber C recovers, PG side B continues to be active and uses the CTI Manager service on Unified CM subscriber D. The PG does not fail-back in this model.

Figure 3-22 Scenario 4 - Only the Unified CM CTI Manager Service Fails

Unified CCE Scenarios for Clustering over the WAN

Unified CCE can also be overlaid with the Unified CM design model for clustering over the WAN, which allows for high availability of Unified CM resources across multiple locations and data center locations. There are a number of specific design requirements for Unified CM to support this deployment model, and Unified CCE adds its own specific requirements and new failover considerations to the model.

Specific testing has been performed to identify the design requirements and failover scenarios. The success of this design model relies on specific network configuration and setup, and the network must be monitored and maintained. The component failure scenarios noted previously (see Unified ICM Failover Scenarios) are still valid in this model, and the additional failure scenarios for this model include:

Scenario 1: Unified ICM Central Controller or Peripheral Gateway Private Network Failure

Scenario 2: Visible Network Failure

Scenario 3: Visible and Private Networks Both Fail (Dual Failure)

Scenario 4: Unified CCE Agent Site WAN (Visible Network) Failure


Note The terms public network and visible network are used interchangeably throughout this document.


Scenario 1: Unified ICM Central Controller or Peripheral Gateway Private Network Failure

In clustering over the WAN with Unified CCE, there should be a separate private network connection between the geographically distributed Central Controller (Call Router/Logger) and the split Peripheral Gateway pair to maintain state and synchronization between the sides of the system.

To understand this scenario fully, a brief review of the ICM Fault Tolerant architecture is warranted. On each call router, there is a process known as the Message Delivery Service (MDS), which delivers messages to and from local processes such as router.exe and which handles synchronization of messages to both call routers. For example, if a route request comes from the carrier or any routing client to side A, MDS ensures that both call routers receive the request. MDS also handles the duplicate output messages.

The MDS process ensures that duplex ICM sides are functioning in a synchronized execution, fault tolerance method. Both routers are executing everything in lockstep, based on input the router receives from MDS. Because of this synchronized execution method, the MDS processes must always be in communication with each other over the private network. They use TCP keep-alive messages generated every 100 ms to ensure the health of the redundant mate or the other side. Missing five consecutive TCP keep-alive messages indicates to Unified ICM that the link or the remote partner system might have failed.

When running duplexed ICM sides as recommended for all production system, one MDS will be the enabled synchronizer and will be in a paired-enabled state. Its partner will be the disabled synchronizer and is said to be paired-disabled. Whenever the sides are running synchronized, the side A MDS will be the enabled synchronizer in paired-enabled state. Its partner, side B, will be the disabled synchronizer and paired-disabled state. The enabled synchronizer sets the ordering of input messages to the router and also maintains the master clock for the ICM system.

If the private network fails between the Unified ICM Central Controllers, the following conditions apply:

The Call Routers detects the failure by missing five consecutive TCP keep-alive messages. The currently enabled side (side A in most cases) transitions to an isolated-enabled state and continues to function as long as it is in communication with at least half of the PGs configured in the system.

The paired-disabled side (side B in most cases) transitions to an isolated-disabled state. This side will then check for device majority. If it is not communicating with either an Active or Idle DMP to more than half of the configured PGs in the system, it will stop processing and stay disabled.

If the B-Side has device majority, (an Active or Idle connection to more than half the configured PGs), it will transition to a "Testing" state and send "Test Other Side" (TOS) messages to each PG. This message is used to ask the PG if it can see the Call Router on the other side (in this case, Router A).

As soon as any (even one) PG responds to the TOS message that the A-Side is still enabled, Router B remains in the Isolated-Disabled state and goes idle. Logger B will also go idle, as will all the DMP connections to the PGs for Router B. All call processing will continue on Side A without impact.

If all of the PGs reply that Side A is down, or not reachable, the B-Side Call Router would re-initialize in simplex mode (isolated-enabled) and take over all routing for the Unified ICM.

There is no impact to the agents, calls in progress, or calls in queue. The system can continue to function normally; however; the Call Routers will be in simplex mode until the private network link is restored.

Additional Considerations

The Call Routers are "paired" with the Loggers and can read/write only to their own Logger for configuration and historical data over the Private Network locally. In the event that the failure is caused by the loss of a Private NIC card in the Call Router, and that Call Router is the enabled side, it will not be able to write any historical data to the Logger nor will any configuration changes be able to be made to the Logger database.

The Private NIC in the Call Router is also used in some cases to communicate with carrier-based Pre-Routing Network or SS7 interfaces. If the Private NIC fails, there would be no way to access these services either.

If there are an even number of PGs checked off in the Call Router Setup, and only half of the PGs are available, then only Side A will run. For the B-Side to be operational during a private network failure, it must be able to communicate with more than half of the PGs in the system.

It is important to maintain the configuration so that "extra" PGs or PGs that are no longer on the network are removed from the Call Router Setup panels to avoid problems with determination of device majority for PGs that no longer exist.

If the private network fails between the Unified CM Peripheral Gateways, the following conditions apply:

The Peripheral Gateway sides detect a failure if they miss five consecutive TCP keep-alive messages, and they follow a process similar to the Call Routers, leveraging the MDS process when handling a private link failure. As with the Central Controllers, one MDS process is the enabled synchronizer and its redundant side is the disabled synchronizer. When running redundant PGs, as is always recommended in production, the A side will always be the enabled synchronizer.

After detecting the failure, the disabled synchronizer (side B) initiates a test of its peer synchronizer via the TOS procedure on the Public or Visible Network connection. If PG side B receives a TOS response stating that the A side synchronizer is enabled or active, then the B side immediately goes out of service, leaving the A side to run in simplex mode until the Private Network connection is restored. The PIM, OPC, and CTI SVR processes become active on PG side A, if not already in that state, and the CTI OS Server process still remains active on both sides as long as the PG side B server is healthy. If the B side does not receive a message stating that the A side is enabled, then side B continues to run in simplex mode and the PIM, OPC, and CTI SVR processes become active on PG side B if not already in that state. This condition should occur only if the PG side A server is truly down or unreachable due to a double failure of visible and private network paths.

There is no impact to the agents, calls in progress, or calls in queue because the agents stay connected to their already established CTI OS Server process connection. The system can continue to function normally; however; the PGs will be in simplex mode until the private network link is restored.

If the two private network connections are combined into one link, the failures follow the same path; however, the system runs in simplex mode on both the Call Router and the Peripheral Gateway. If a second failure were to occur at that point, the system could lose some or all of the call routing and ACD functionality.

Scenario 2: Visible Network Failure

The visible network in this design model is the network path between the data center locations where the main system components (Unified CM subscribers, Peripheral Gateways, Unified IP IVR/Unified CVP components, and so forth) are located. This network is used to carry all the voice traffic (RTP stream and call control signaling), Unified ICM CTI (call control signaling) traffic, as well as all typical data network traffic between the sites. In order to meet the requirements of Unified CM clustering over the WAN, this link must be highly available with very low latency and sufficient bandwidth. This link is critical to the Unified CCE design because it is part of the fault-tolerant design of the system, and it must be highly resilient as well:

The highly available (HA) WAN between the central sites must be fully redundant with no single point of failure. (For information regarding site-to-site redundancy options, refer to the WAN infrastructure and QoS design guides available at http://www.cisco.com/go/designzone.) In case of partial failure of the highly available WAN, the redundant link must be capable of handling the full central-site load with all QoS parameters. For more information, see the section on Bandwidth Requirements for Unified CCE Clustering Over the WAN, page 12-19.

A highly available (HA) WAN using point-to-point technology is best implemented across two separate carriers, but this is not necessary when using a ring technology.

If the visible network fails between the data center locations, the following conditions apply:

The Unified CM subscribers will detect the failure and continue to function locally, with no impact to local call processing and call control. However, any calls that were set up over this WAN link will fail with the link.

The Unified ICM Call Routers will detect the failure because the normal flow of TCP keep-alives from the remote Peripheral Gateways will stop. Likewise, the Peripheral Gateways will detect this failure by the loss of TCP keep-alives from the remote Call Routers. The Peripheral Gateways will automatically realign their data communications to the local Call Router, and the local Call Router will then use the private network to pass data to the Call Router on the other side to continue call processing. This does not cause a failover of the Peripheral Gateway or the Call Router.

Half the agents or more might be affected by this failure under the following circumstances:

If the agent desktop (Cisco Agent Desktop or CTI OS) is registered to the Peripheral Gateway on side A of the system but the physical phone is registered to side B of the Unified CM cluster.

Under normal circumstances, the phone events would be passed from side B to side A over the visible network via the CTI Manager Service to present these events to the side A Peripheral Gateway. The visible network failure will not force the IP phone to re-home to side A of the cluster, and the phone will remain operational on the isolated side B. The Peripheral Gateway will no longer be able to see this phone, and the agent will be logged out of Unified CCE automatically because the system can no longer direct calls to the agent's phone.

If the agent desktop (Cisco Agent Desktop or CTI OS) and IP phone are both registered to side A of the Peripheral Gateway and Unified CM, but the phone is reset and it re-registers to a side B of the Unified CM subscriber.

If the IP phone re-homes or is manually reset and forced to register to side B of a Unified CM subscriber, the Unified CM subscriber on side A that is providing the CTI Manager service to the local Peripheral Gateway will unregister the phone and remove it from service. Because the visible network is down, the remote Unified CM subscriber at side B cannot send the phone registration event to the remote Peripheral Gateway. Unified CCE will log out this agent because it can no longer control the phone for the agent.

If the agent desktop (CTI toolkit Agent Desktop or Cisco Agent Desktop) is registered to the CTI OS Server at the side-B site but the active Peripheral Gateway side is at the side-A site.

Under normal operation, the CTI toolkit Agent Desktop will load-balance their connections to the CTI OS Server pair. At any given time, half the agent connections would be on a CTI OS server that has to cross the visible network to connect to the active Peripheral Gateway CTI Server (CG). When the visible network fails, the CTI OS Server detects the loss of connection with the remote Peripheral Gateway CTI Server (CG) and disconnects the active agent desktop clients to force them to re-home to the redundant CTI OS Server at the remote site. The CTI toolkit Agent Desktop is aware of the redundant CTI OS server and will automatically use this server. During this transition, the CTI toolkit Agent Desktop will be disabled and will return to operational state as soon as it is connected to the redundant CTI OS server. (The agent may be logged out or put into not-read state, depending upon the /LOAD parameter defined for the Unified CM Peripheral Gateway in Unified ICM Config Manager).

Scenario 3: Visible and Private Networks Both Fail (Dual Failure)

Individually, the private and visible networks can fail with limited impact to the Unified CCE agents and calls. However, if both of these networks fail at the same time, the system will be reduced to very limited functionality. This failure should be considered catastrophic and should be avoided by careful WAN design, with backup and resiliency built into the design.

If both the visible and private networks fail at the same time, the following conditions apply:

The Unified CM subscribers will detect the failure and continue to function locally, with no impact to local call processing and call control. However, any calls that were set up and are sending the active voice path media over the visible WAN link will fail with the link. When the call fails, the Unified CCE PG will see the call drop and will write a Termination Call Detail (TCD) record in the ICM database for that call at the time it is dropped.

The Call Routers and Peripheral Gateways will detect the private network failure after missing five consecutive TCP keep-alive messages. These TCP keep-alive messages are generated every 100 ms, and the failure will be detected within about 500 ms on this link.

The Call Routers will attempt to contact their Peripheral Gateways with the test-other-side message to determine if the failure was a network issue or if the remote Call Router had failed and was no longer able to send TCP keep-alive messages. The Call Routers determine which side will continue to be active (typically, this would be the A-Side of the system because it is the side with the most active Peripheral Gateway connections), and that side will stay active in simplex mode while the remote Call Router and PGs will be in isolated-disabled mode. The Call Routers will send a message to the Peripheral Gateways to realign their data feeds to the active Call Router only.

The Peripheral Gateways will determine which side has the active Unified CM connection. However, it will also consider the state of the Call Router, and the Peripheral Gateway will not remain active if it is not able to connect to an active Call Router. Typically, this will force the A-Side PGs into active simplex enabled mode and the B-Side into isolated-disabled.

The surviving Call Router and Peripheral Gateways will detect the failure of the visible network by the loss of TCP keep-alives on the visible network. These keep-alives are sent every 400 ms, so it can take up to two seconds before this failure is detected.

The Call Router will be able to see only the local Peripheral Gateways, which are those used to control local Unified IP IVRs or Unified CVP Call Servers and the local half of the Unified CM cluster. The remote Unified IP IVRs or CVP Call Servers will be off-line with no Unified ICM Call Control via the GED-125 IVR PG interface. The Unified ICM Call Routing Scripts automatically routes around these off-line devices using the peripheral-on-line status checks. Calls that were in progress in the off-line IP-IVRs will either drop or use the local default script in the IP-IVR or the Call Forward on Error settings in Unified CM. Calls under Unified CVP control from the off-line Call Servers will get treatment from the survivability TCL script in their ingress voice gateways. For calls that were in progress but are no longer visible to Unified CCE, a Termination Call Detail (TCD) record is written to the ICM database for the call data up to the time of the failure. If the default or survivability scripts redirect the calls to another active Unified CCE component, the call will appear as a "new call" to the system, with no relationship to the original call for reporting or tracking purposes.

Any new calls that come into the disabled side will not be routed by the Unified CCE, but they can be redirected or handled using standard Unified CM redirect on failure for their CTI route points or the Unified CVP survivability TCL script in the ingress voice gateways.

Agents will be impacted as noted above if their IP phones are registered to the side of the Unified CM cluster opposite the location of their active Peripheral Gateway and CTI OS Server connection. Only agents that were active on the surviving side of the Peripheral Gateway with phones registered locally to that site will not be impacted.

At this point, the Call Router and Unified CM Peripheral Gateway will run in simplex mode, and the system will accept new calls from only the surviving side for Unified CCE call treatment. The Unified IP IVR/Unified CVP functionality will also be limited to the surviving side as well.

Scenario 4: Unified CCE Agent Site WAN (Visible Network) Failure

The Unified CCE design model for clustering over the WAN assumes the Unified CCE agents are remotely located at multiple sites connected by the visible WAN. Each agent location requires WAN connectivity to both of the data center locations across the visible WAN where the Unified CM and Unified ICM components are located. These connections should provide for redundancy as well as making use of basic SRST functionality in the event of a complete network failure, so that the remote site would still have basic dial tone service to make emergency (911) calls.

If side A of the WAN at the Unified CCE Agent Site fails, the following conditions apply:

Any IP phones that are homed to the side-A Unified CM subscribers will automatically re-home to the side-B subscribers (provide the redundancy group is configured).

Agent desktops that are connected to the CTI OS or Cisco Agent Desktop server at that site will automatically realign to the redundant CTI OS server at the remote site. (Agent desktops will be disabled during the realignment process.)

If both sides of the WAN at the Unified CCE Agent Site fail, the following conditions apply:

The local voice gateway will detect the failure of the communications path to the Unified CM cluster and will go into SRST mode to provide local dial-tone functionality. With Unified CVP, these gateways detect the loss of the CVP Call Server and execute their local survivability TCL script to reroute the inbound calls. Active calls in Unified CVP locally would no longer be visible to Unified CCE, so a Termination Call Detail (TCD) record would be written to the ICM database at the time of the failure and tracking of the call would stop at that point. The call would execute the local survivability TCL script, which could redirect it using the PSTN to another Unified CCE site that remains active; however, the call would then appear as a "new call" to Unified CCE and would have no relationship with the original call information. If the call is retained locally and redirected via SRST to a local phone, Unified CCE would not have visibility to the call from that point forward.

The agent desktop will detect the loss of connectivity to the CTI OS Server (or Cisco Agent Desktop Server) and automatically log the agent out of the system. While the IP phones are in SRST mode, they will not be able to function as Unified CCE agents.

Understanding Failure Recovery

This section analyzes the failover recovery of each individual part (products and subcomponents inside each product) of the Unified CCE solution.

Unified CM Service

In larger deployments, it is possible that the Unified CM to which the agent phones are registered will not be running the CTI Manager service that communicates with the Unified CM Peripheral Gateway for Unified CCE. When an active Unified CM (call processing) service fails, all the devices registered to it are reported "out of service" by the CTI Manager service locally and to any external client, such as the Peripheral Gateway on a different subscriber CTI Manager service.

Unified CM call detail reporting (CDR) shows the call as terminated when the Unified CM failure occurred, although the call may have continued for several minutes after the failure because calls in progress stay in progress. IP phones of agents not on calls at the time of failure will quickly register with the backup Unified CM subscriber. The IP phone of an agent on a call at the time of failure will not register with the backup Unified CM subscriber until after the agent completes the current call. If MGCP, H.323, or SIP gateways are used, then the calls in progress survive, but further call control functions (hold, retrieve, transfer, conference, and so on) are not possible.

Unified CCE will also write a call record to the Termination Call Detail (TCD) table because Unified CM has reported the call as terminated to the Unified CCE PG. If the call continues after the PG has failed-over, a second TCD record will be written as a "new call" not related to the original call.

When the active Unified CM subscriber fails, the PG receives out-of-service events from Unified CM and logs out the agents. To continue receiving calls, the agents must wait for their phones to re-register with a backup Unified CM subscriber, then log back into their Unified CCE desktop application to have its functionality restored. Upon recovery of the primary Unified CM subscriber, the agent phones re-register to their original subscriber to return the cluster to the normal state, with phones and devices properly balanced across multiple active subscribers.

In summary, the Unified CM call processing service is separate from the CTI Manager service, which connects to the Unified CM PG via JTAPI. The Unified CM call processing service is responsible for registering the IP phones, and its failure does not affect the Unified CM PGs. From a Cisco Unified CCE perspective, the PG does not go off-line because the Unified CM server running CTI Manager remains operational. Therefore, the PG does not need to fail-over.

Unified IP IVR

When a CTI Manager service fails, the Unified IP IVR JTAPI subsystem shuts down and restarts by trying to connect to the secondary CTI Manager service on a backup Unified CM subscriber in the cluster. In addition, all voice calls at this Unified IP IVR are dropped. If there is an available secondary CTI Manager service on a backup subscriber, the Unified IP IVR logs into this CTI Manager service on that subscriber and re-registers all the CTI ports associated with the Unified IP IVR JTAPI user. After all the Unified CM devices are successfully registered with the Unified IP IVR JTAPI user, the server resumes its Voice Response Unit (VRU) functions and handles new calls. This action does not impact the Unified CVP because it does not depend upon the Unified CM CTI Manager service for call control.

Unified IP IVR Release 3.5 provided for cold standby and Release 4.0 provides hot standby redundancy, but this configuration is not supported for use with Unified CCE. These designs make use of a redundant server that is not used unless there is a failure of the primary Unified IP IVR server. However, during this failover processing, all calls that are in queue or treatment are dropped on the Unified IP IVR as part of the failover. A more resilient design would be to deploy a second (or more) Unified IP IVR server(s) and have them all active, allowing the Unified CCE to load-balance calls across them automatically. As shown in Figure 3-23, if one of the Unified IP IVR servers should fail, only the calls on that server would fail, but the other active servers would remain active and be able to accept new calls in the system.

Unified ICM

The Unified ICM is a collection of services and processes running on Unified ICM servers. The failover and recovery process for each of these services is unique and requires carefully examination to understand the impact to other parts of the Unified CCE solution, including another Unified ICM service.

Unified CM PG and CTI Manager Service

When the active CTI Manager Service or PG software fails, the PG JTAPI Gateway/PIM detects an OUT_OF_SERVICE event and induces a failover to the redundant (duplex) PG. Because the redundant PG is logged into the backup Unified CM subscriber CTI Manager Service already, it registers the IP phones and configured dialed numbers or CTI route points automatically. This initialization service takes place at a rate of about 5 devices per second. The agent desktops show them as being logged out or not ready, and a message displays stating that their routing client or peripheral (Unified CM) has gone off-line. (This warning can be turned on or off, depending on the administrator's preference.) All agents and supervisors lose their desktop third-party call control functionality until the failure recovery is complete. The agents and supervisors can recognize this event because call control action buttons on the desktop will gray out, and they will not be able to do anything with the desktop. Any existing calls should remain active without any impact to the caller.

In the event that calls arrive at the CTI Route Points in Unified CM during a PG failover and the PIM is not yet fully operational, these calls will fail unless these route points are configured with a recovery number in their "Call Forward on Unregistered" or "Call Forward on Failure" setting. These recovery numbers could be the Cisco Unity voicemail system for the Auto Attendant, or perhaps the company operator position, to ensure the incoming calls are getting answered.


Note Agents should not push any buttons during desktop failover because these keystrokes can be buffered and sent to the CTI server when it completes its failover and restores the agent states.


When an active PG fails over to the idle side, calls still in progress will be recovered by querying Unified CM as part of the activation sequence. There will be two Termination Call Detail records providing information on the call prior to and after the PG transition. Peripheral call variables and ECC variables will be lost on the agent desktop. Indication of whether the call was a barge-in or a conference call will be lost on the agent desktop and in reports. Calls that were in the wrap-up state will not be recovered. Agents will be able to release, transfer, or conference calls from their agent desktop after activation completes.


Note Call and agent state information might not be complete at the end of a failover if there are call status and agent state changes during the failover window.


Unified ICM Voice Response Unit PG

When a Voice Response Unit (VRU) PG fails, all the calls currently in queue or treatment on that Unified IP IVR are dropped unless there is a default script application defined or the CTI Ports have a recovery number defined in Unified CM for their "Call Forward on Failure" setting. Calls in progress or queued in Unified CVP are not dropped and will be redirected to a secondary Unified CVP or number in the H.323 or SIP dial plan, if available by the Survivability TCL script in the voice gateway.

The redundant (duplex) VRU PG side will connect to the Unified IP IVR or CVP and begin processing new calls upon failover. Upon recovery of the failed VRU PG side, the currently running VRU PG continues to operate as the active VRU PG. Therefore, having redundant VRU PGs adds significant value because it allows an IP IVR or CVP to continue to function as an active queue point or to provide call treatment. Without VRU PG redundancy, a VRU PG failure would block use of that IP IVR even though the IP IVR is working properly. (See Figure 3-23.)

Figure 3-23 Redundant Unified ICM VRU PGs with Two IP IVR Servers

Unified ICM Call Router and Logger

The Unified ICM Central Controllers or Unified ICM Servers are shown in these diagrams as a single set of redundant servers. However, depending upon the size of the implementation, they could be deployed with multiple servers to host the following key software processes:

Unified ICM Call Router

The Unified ICM Call Router is the brain of the system, and it maintains a constant memory image of the state of all the agents, calls, and events in the system. It performs the call routing in the system, executing the user-created Unified ICM Routing Scripts and populating the real-time reporting feeds for the Administrative Workstation. The Call Router software runs in synchronized execution, with both of the redundant servers running the same memory image of the current state across the system. They keep this information updated by passing the state events between the servers on the private LAN connection.

Unified ICM Logger and Database Server

The Unified ICM Logger and Database Server maintain the system database for the configuration (agent IDs, skill groups, call types, and so forth) and scripting (call flow scripts) as well as the historical data from call processing. The Loggers receive data from their local Call Router process to store in the system database. Because the Call Routers are synchronized, the Logger data is also synchronized. In the event that the two Logger databases are out of synchronization, they can be resynchronized manually by using the Unified ICMDBA application over the private LAN. The Logger also provides a replication of its historical data to the customer Historical Database Server (HDS) Administrative Workstations over the visible network.

In the event that one of the Unified ICM Call Routers should fail, the surviving server will detect the failure after missing five consecutive TCP keep-alive messages on the private LAN. The Call Routers generate these TCP keep-alive messages every 100 ms, so it will take up to 500 ms to detect this failure. Upon detection of the failure, the surviving Call Router will contact the Peripheral Gateways in the system to verify the type of failure that occurred. The loss of TCP keep-alive messages on the private network could be caused by either of the following conditions:

Private network outage — It is possible for the private LAN switch or WAN to be down but for both of the Unified ICM Call Routers to still be fully operational. In this case, the Peripheral Gateways will still see both of the Unified ICM Call Routers even though they cannot see each other over the private network to provide synchronization data. If the disabled synchronizer (Call Router B) can communicate with a majority of the PGs, it will then send a Test Other Side (TOS) message to the PGs sequentially to determine if the Call Router on the other side (Side A) is enabled. If Call Router B receives a message that side A is in fact enabled, then Call Router A will run in simplex until the private network is restored. If all the PGs reply to the TOS message and indicate that side A is down, then side B re-initializes in simplex mode.

Call Router hardware failure — It is possible for the Call Router on the other side to have a physical hardware failure and be completely out of service. In this case, the Peripheral Gateways would report that they can no longer see the Call Router on the other side, and the surviving Call Router would take over the active processing role in simplex mode. This failure is detected by the Call Routers from the loss of heartbeat keep-alives on the Private Network.

During the Call Router failover processing, any Route Requests sent to the Call Router from a Carrier Network Interface Controller (NIC) or Peripheral Gateway will be queued until the surviving Call Router is in active simplex mode. Any calls in progress in the IVR or at an agent will not be impacted.

If one of the Unified ICM Logger and Database Servers were to fail, there would be no immediate impact except that the local Call Router would no longer be able to store data from call processing. The redundant Logger would continue to accept data from its local Call Router. When the Logger server is restored, the Logger will contact the redundant Logger to determine how long it had been off-line. If the Logger was off-line for less than 12 hours, it will automatically request all the transactions it missed from the redundant Logger while it was off-line. The Loggers maintain a recovery key that tracks the date and time of each entry recorded in the database, and these keys are used to restore data to the failed Logger over the private network.

If the Logger was off-line for more than 12 hours, the system will not automatically resynchronize the databases. In this case, resynchronization has to be done manually using the Unified ICMDBA application. Manual resynchronization allows the system administrator to decide when to perform this data transfer on the private network, perhaps scheduling it during a maintenance window when there would be little call processing activity in the system.

The Logger replication process that sends data from the Logger database to the HDS Administrative Workstations will automatically replicate each new row written to the Logger database when the synchronization takes place as well.

There is no impact to call processing during a Logger failure; however, the HDS data that is replicated from that Logger would stop until the Logger can be restored.

Additionally, if the Unified Outbound Option is used, the Campaign Manager software is loaded on Logger A only. If that platform is out of service, any outbound calling will stop until the Logger can be restored to operational status.

Administrative Workstation Real-Time Distributor (RTD)

The Administrative Workstation (AW) Real-Time Distributor (RTD) provides the user interface to the system for making configuration and scripting changes. It also can host the web-based reporting tool, WebView and Internet Script Editor.

These servers do not support redundant or duplex operation, as the other Unified ICM system components do. However, you can deploy multiple Administrative Workstation servers to provide redundancy for the Unified CCE. (See Figure 3-24.)

Figure 3-24 Redundant Unified ICM Distributors and AW Servers

Administrative Workstation Real-Time Distributors are clients of the Unified ICM Call Router real-time feed that provides real-time information about the entire Unified CCE across the enterprise. Real-Time Distributors at the same site can be set up as part of an Admin Site that includes a designated primary real-time distributor and one or more secondary real-time distributors. Another option is to add Client Admin Workstations which do not have their own local SQL databases and are homed to a Real-Time Distributor locally for their SQL database and real-time feed.

The Admin Site reduces the number of real-time feed clients the Unified ICM Call Router has to service at a particular site. For remote sites, this is important because it can reduce the required bandwidth to support remote Admin Workstations across a WAN connection.

When using an Admin Site, the primary real-time distributor is the one that will register with the Unified ICM Call Router for the real-time feed, and the other real-time distributors within that Admin Site register with the primary real-time distributor for the real-time feed. If the primary real-time distributor is down or does not accept the registration from the secondary real-time distributors, they will register with the Unified ICM Call Router for the real-time feed. Client AWs that cannot register with the primary or secondary real-time distributors will not be able to perform any Admin Workstation tasks until the distributors are restored.

Alternatively, each real-time distributor could be deployed in its own Admin Site regardless of the physical site of the device. This deployment will create more overhead for the Unified ICM Call Router to maintain multiple real-time feed clients; however, it will prevent a failure of the primary real-time distributor from taking down the secondary distributors at the site.

Additionally, if the Admin Workstation is being used to host the ConAPI interface for the Multi-Channel Options (Cisco Email Manager Option and Cisco Collaboration Server Option) or the Cisco Unified Contact Center Management Portal (Unified CCMP), any configuration changes made to the Unified ICM, Cisco Email Manager, Cisco Collaboration Server, or Unified CCMP systems will not be passed over the ConAPI interface until it is restored.

CTI Server

The CTI Server monitors the data traffic of the Unified CM PIM on the Agent PG for specific CTI messages (such as call ringing or off-hook events) and makes those messages available to CTI clients such as the CTI OS Server or Cisco Agent Desktop Enterprise Server. It also processes third-party call control messages (such as make call or answer call) from the CTI clients and sends those messages via the PIM interface of the PG to Unified CM to process the event on behalf of the agent desktop.

CTI Server is redundant and co-resident on the Agent PG servers. (See Figure 3-25.) It does not, however, maintain agent state in the event of a failure. Upon failure of the CTI Server, the redundant CTI server becomes active and begins processing call events. CTI OS Server is a client of the CTI Server and is designed to monitor both CTI Servers in a duplex environment and maintain the agent state during failover processing. CTI OS agents will see their desktop buttons gray-out during the failover to prevent them from attempting to perform tasks while the CTI Server is down. The buttons will be restored as soon as the redundant CTI Server is restored, and the agent does not have to log on again to the desktop application.

The CTI Server is also critical to the operation of the Multi-Channel Options (Cisco Email Manager and Cisco Content Server) as well as the Unified Outbound Option. If the CTI Server is down on both sides of the duplex agent Peripheral Gateway pair, none of the agents for that Agent Peripheral Gateway will be able to log into these applications.

Figure 3-25 Redundant CTI Servers Co-Located on Agent PG

CTI OS Considerations

CTI OS Server is a software component that runs co-located on the Unified CM Peripheral Gateway. CTI OS Server software is designed to be fault-tolerant and is typically deployed on redundant physical servers; however, unlike the PG processes that run in hot-standby mode, both of the CTI OS Server processes run in active mode all the time. The CTI OS Server processes are managed by NodeManager, which monitors each process running as part of the CTI OS service and which automatically restarts abnormally terminated processes.

CTI OS handles failover of related components as described in the following scenarios (see Figure 3-26).

Figure 3-26 Redundant CTI OS Server Processes

Scenario 1: CTI Server Side A (Active) Fails

In this scenario, CTI Server side A is co-located on PG side A, and the following events occur:

CTI Server side B detects the failure of side A and becomes active.

NodeManager restarts CTI Server side A and becomes idle.

Both CTI OS Server sides A and B drop all CTI OS client/agent connections and restart after losing the connection to CTI Server A. At startup, CTI OS Server sides A and B stay in CONNECTING state until they connect to CTI Server side B, and then they go to CONFIGURING state, where they download agent and call states and configuration information. CTI OS Client connections are not accepted by CTI OS Server A and B during CONNECTING and CONFIGURING states. When CTI OS Server synchronizes with CTI Server, the state becomes ACTIVE and it is now ready to accept CTI OS Client connections.

Both CTI OS Clients 1 and 2 loose connections to CTI OS Servers, and they each randomly select one CTI OS Server to connect to. CTI OS Client 1 can be connected to either CTI OS Server A or B, and the same is true for CTI OS Client 2. During this transition, the buttons of the CTI Toolkit Agent Desktop will be disabled and will return to operational state as soon as it is connected to a CTI OS.

Scenario 2: CTI Server B (Idle) Fails

In this scenario, CTI Server side B is co-located on PG side B but was not the active side. The following events occur:

CTI Server side A stays active.

NodeManager restarts CTI Server side B and stays idle.

Neither CTI OS Clients nor CTI OS Servers are affected by this failure.

Scenario 3: CTI OS Server A Fails

In this scenario, CTI OS Server side A processes are co-located on PG/CTI Server side A. The following events occur:

CTI OS Client 1 detects the loss of network connection and automatically connects to CTI OS server B. During this transition, the buttons of the CTI Toolkit Agent Desktop will be disabled and will return to operational state as soon as it is connected to CTI OS server B.

CTI OS Client 2 stays connected to CTI OS Server B.

NodeManager restarts CTI OS Server A.

Scenario 4: CTI OS Server B Fails

In this scenario, CTI OS Server side A processes are co-located on PG/CTI Server side B. The following events occur:

CTI OS Client 2 detects the loss of network connection and automatically connects to CTI OS server A. During this transition, the buttons of the CTI Toolkit Agent Desktop will be disabled and will return to operational state as soon as it is connected to CTI OS server A.

CTI OS Client 1 stays connected to CTI OS Server A.

NodeManager restarts CTI OS Server B.

Scenario 5: CTI OS Client 1 Fails

In this scenario, the following events occur:

The agent manually restarts CTI OS Client 1 application.

CTI OS Client 1 randomly selects one CTI OS Server to connect to. (CTI OS Client 1 can be connected to either CTI OS Server A or B.)

Once connected, the agent logs in, and CTI OS Client 1 recovers its state by getting agent and call states through the CTI OS Server to which it is connected.

Scenario 6: CTI OS Client 2 Fails

In this scenario, the following events occur:

The agent manually restarts CTI OS Client 2 application.

CTI OS Client 2 randomly selects one CTI OS Server to connect to. (CTI OS Client 2 can be connected to either CTI OS Server A or B.)

Once connected, the agent logs in, and CTI OS Client 2 recovers its state by getting agent and call states through the CTI OS Server to which it is connected.

Scenario 7 - Network Failure Between CTI OS Client 1 and CTI OS Server A

In this scenario, the following events occur:

CTI OS Server A drops the connection of CTI OS Client 1

CTI OS Client 1 detects the loss of network connection and automatically connects to CTI OS server B. During this transition, the buttons of the CTI Toolkit Agent Desktop will be disabled and will return to operational state as soon as it is connected to CTI OS server B.

Scenario 8: Network Failure Between CTI OS Client 1 and CTI OS Server B

CTI OS Client 1 is not affected by this failure because it is connected to CTI OS Server A.

Scenario 9: Network Failure Between CTI OS Client 2 and CTI OS Server A

CTI OS Client 2 is not affected by this failure because it is connected to CTI OS Server B.

Scenario 10: Network Failure Between CTI OS Client 2 and CTI OS Server B

In this scenario, the following events occur:

CTI OS Server B drops the connection of CTI OS Client 2.

CTI OS Client 2 detects the loss of network connection and automatically connects to CTI OS server A. During this transition, the buttons of the CTI Toolkit Agent Desktop will be disabled and will return to operational state as soon as it is connected to CTI OS server A.

Cisco Agent Desktop Considerations

Cisco Agent Desktop is a client of CTI OS, which provides for automatic failover and redundancy for the Cisco Agent Desktop Server. If the Unified CM Peripheral Gateway or CTI Server (CG) fail-over, CTI OS maintains the agent state and information during the failover to prevent agents from being logged out by the system because of the failover.

The Cisco Agent Desktop Servers (Enterprise Server, Chat, RASCAL, and so forth) can also be deployed redundantly to allow for failover of the core Cisco Agent Desktop components. Cisco Agent Desktop software is aware of the redundant Cisco Agent Desktop Servers and will automatically fail-over in the event of a Cisco Agent Desktop Server process or hardware failure.

Design Considerations for Unified CCE System Deployment with Unified ICM Enterprise

In Unified CCE 7.0, a new deployment model was introduced to allow multiple Unified CCE systems to be interconnected in a single, seamless contact center environment managed by a single Unified ICM Enterprise system for enterprise-wide routing and reporting across multiple Unified CCE systems. This deployment model is also known as parent/child, where the Unified ICM acts as the parent controlling one or more Unified CCE System child IP ACDs. (See Figure 3-27.) In this model, the Unified ICM Enterprise system is designed to be the network call routing engine for the contact centers, with network queuing using the Unified CVP and Unified CCE Gateway Peripheral Gateways to connect child Unified CCE systems (either Unified CCE or Unified CCX). The child Unified CCE systems are individual IP-ACD systems, fully functional with local call processing in case they lose their WAN connection to the parent Unified ICM system. This configuration provides a high level of redundancy and availability to the Unified CCE solution to allow sites to remain functional as Unified CCE sites even if they are cut off from centralized call processing resources.

Figure 3-27 Parent/Child Deployment Model

Parent/Child Components

The following sections describe the components used in Unified ICM Enterprise (parent) and Unified CCE System (child) deployments.

The Unified ICM Enterprise (Parent) Data Center

The Unified ICM parent data center location contains the Unified ICM Central Controller. In Figure 3-27, it is shown as a redundant (duplex) pair of Roggers, which are a combination of Call Router and Logger servers. These servers can be deployed as individual Call Routers and Loggers for a larger deployment if needed, and they can also be deployed in two different data centers to be geographically distributed for additional fault tolerance.

The Unified ICM Roggers control Peripheral Gateways at the data center location. In Figure 3-27, there is only a redundant (duplex) pair of IVR PGs used to control the Unified CVP across the architecture. Additional PGs can be inserted at this layer to control TDM or legacy ACDs and IVRs, perhaps to support a migration to Unified CCE or to support out-source locations that still use the TDM or legacy ACDs. The Unified ICM parent at this level can also support standard pre-routing with inter-exchange carriers (IXCs) such as AT&T, Sprint, MCI, and others, thus allowing the Unified ICM to select the best target for the call while it is still in the carrier network.

The Unified ICM parent is not designed to support any directly controlled agents in this model, which means that it does not support classic Unified CCE with a Unified CM Peripheral Gateway installed on this Unified ICM parent. All agents must be controlled externally to this Unified ICM parent system.

The Unified CVP or IVR PG pair controls the Customer Voice Portal Call Server, which translates the IVR PG commands from Unified ICM into VoiceXML and directs the VoiceXML to the voice gateways at the remote contact center sites. This allows calls from the data center location to come into the remote call centers under control of the CVP at the parent location. The parent then has control over the entire network queue of calls across all sites and will hold the calls in queue on the voice gateways at the sites until an agent becomes available.

The Unified Contact Center Express (CCX) Call Center (Child) Site

The Unified Contact Center Express (CCX) Call Center location contains a local Unified CM cluster that provides local IP-PBX functionality and call control for the IP phones and local CVP voice gateway. There is also a local Unified CCX Server, Release 4.0 or above, that provides IP-ACD functionality for the site. The Unified CCX Server has the Unified CCE Gateway PG installed on it, which reduces the number of servers required to support this contact center site. The Unified CCE Gateway PG connects to the Unified ICM Call Router (Rogger) at the Unified ICM parent data center location over the WAN and provides real-time event data and agent states to the parent from the Unified CCX. The Unified CCE Gateway PG also captures configuration data (skill groups, CSQs, services, applications, and so forth) and sends it to the parent Unified ICM configuration database as well.

Additional Unified CCX servers may be used and included in this site to provide redundant Unified CCX Servers, historical reporting database services, recording and monitoring servers, and ASR/TTS servers as well.

The Unified CCE Call Center (Child) Site

The Unified CCE Call Center location contains a local Unified CM cluster that provides local IP-PBX functionality and call control for the IP phones and local CVP voice gateway. There is also a local Unified IP IVR to provide local call queuing for the Unified CCE site. There is a redundant pair of Unified CCE Gateway PGs that are used to connect this site to the Unified ICM parent Call Router (Rogger) at the Unified ICM parent data center location over the WAN and to provide real-time event data and agent states to the parent from the Unified CCE child. The Unified CCE Gateway PGs also capture configuration data (skill groups, services, call types, and so forth) and send it to the parent Unified ICM configuration database as well.

In Unified CCE 7.5(x), the IP-IVR at the Child site can be replaced with a local Unified CVP instance. Unified CVP is not integrated as part of the Agent Controller's System PG; there is a separate IVR PG defined specifically for unified CVP as part of the installation for System CCE with Unified CVP. Because Unified CVP is not part of the System PG, calls in queue or treatment in Unified CVP will not be reported to the Parent ICM via the Unified CCE Gateway PG.

A local Unified CCE child system is used to provide IP-ACD functionality, and it can be sized depending upon the type of deployment required:

Progger configuration

Single (or duplex) server that contains the Unified CCE components: Call Router and Logger, System PG for Unified CM and IP IVR, CTI Server and CTI OS Server, and optionally the Unified CVP Controller.

Rogger configuration with separate Unified CCE Agent Controller (System PG and optional Unified CVP controller and CTI/CTI OS Server)

The Rogger configuration contains the Unified CCE components: Call Router and Logger as a single set of duplex Central Controllers, and a separate Agent Controller set of duplex servers that contain the System PG for Unified CM and IP IVR, CTI Server and CTI OS Server, and the optional Unified CVP Controller.


Note In Unified CCE 7.5(x), the Outbound Controller can be installed on the Agent Controller to add the Dialer and Media Routing (MR) PG to the same server as well.


For more details about the capacity of these configurations, refer to Sizing Unified CCE Components and Servers, page 10-1.

In either configuration, a separate Administrative Workstation Server is required to host the Web-based reporting (WebView), configuration (WebConfig), and scripting (Internet Script Editor) tools for the system as well as an Historical Database Server option.


Note Cisco Agent Desktop (CAD) may be used with the Unified System CCE Child


Parent/Child Call Flows

The following sections describe the call flows for the parent and child.

Typical Inbound PSTN Call Flow

In a typical inbound call flow from the PSTN, calls would be directed by the carrier network to the contact center sites using some predefined percent allocation or automatic routing method. These calls are terminated in the CVP voice gateways at the call center locations, under control of the Unified ICM parent CVP. The inbound call flow is as follows:

1. The call arrives on the CVP voice gateway at the Unified CCE call center location.

2. The CVP voice gateway maps the call by dialed number to a particular CVP Call Server at the Unified ICM parent site and sends a new call event to the CVP Call Server.

3. The CVP Call Server sends the new call event message to the CVP or IVR PG at the Unified ICM parent site.

4. The CVP PG sends the new call message to the Unified ICM parent, which uses the inbound dialed number to qualify a routing script to determine the proper call treatment (messaging) or agent groups to consider for the call.

5. Unified ICM instructs the CVP to hold the call in the voice gateway at the site and wait for an available agent, while directing specific instructions to play .wav files for hold music to the caller in the gateway.

6. When an agent becomes available, the Unified ICM instructs the CVP to transfer the call to the site with the available agent by using a translation route. (The agent might not be at the same physical site but across the WAN.) Any data collected about the call in the Unified ICM parent CVP will be transferred to the remote system's PG (either a TDM, legacy PG, or one of the Unified CCE Gateway PGs for Unified CCX or Unified CCE).

7. When the call arrives at the targeted site, it will arrive on a specific translation route DNIS that was selected by the Unified ICM parent. The PG at the site is expecting a call to arrive on this DNIS to match up with any pre-call CTI data associated with the call. The local ACD or Unified CCE will perform a post-route request to the local PG to request the CTI data as well as the final destination for the call (typically the lead number for the skill group of the available agent).

8. If the agent is no longer available for the call (walked away or unplugged), Unified CVP at the Parent site will use the Router Requery function in the ICM Call Routing Script to select another target for the call automatically.

Post-Route Call Flow

Post-routing is used when a call is already at a peripheral ACD or IVR and needs to be routed intelligently to another agent or location. If an agent gets a call in the ACD or Unified CCE that needs to be sent to a different skill group or location, the agent can make use of the post-route functionality to reroute the call. The post-route call flow is as follows:

1. The agent transfers the call to the local CTI route point for reroute treatment using the CTI agent desktop.

2. The reroute application or script makes a post-route request to the Unified ICM parent via the local Unified CCE Gateway PG connection.

3. The Unified ICM parent maps the CTI route point from the Unified CCE as the dialed number and uses that number to select a routing script. This script will return a label or routing instruction that can move the call to another site, or to the same site but into a different skill group, or to a CVP node for queueing.

4. The Unified CCE receives the post-route response from the Unified ICM parent system and uses the returned routing label as a transfer number to send the call to the next destination.

Parent/Child Fault Tolerance

The parent/child model provides for fault tolerance to maintain a complete IP-ACD with either Unified CCX or Unified CCE deployed at the site, with local IP-PBX and call treatment and queueing functionality.

Unified CCE Child Loses WAN Connection to Unified ICM Parent

If the WAN between the Unified CCE child site and the Unified ICM parent fails, the local Unified CCE system will be isolated from the parent as well as the Unified CVP voice gateway. Calls coming into the site will no longer get treatment from the CVP under control of the Unified ICM parent, so the following functionality must be replicated locally, depending on the Child configuration.

For Unified CCE Child configurations using local IP IVR resources for queue and treatment:

The local voice gateway must have dial peer statements to pass control of the calls to the local Unified CM cluster if the Parent CVP Call Server cannot be reached. Also, the local Unified CM cluster must have CTI route points mapped to the inbound DNIS or dialed numbers that the local voice gateway will present if the Parent CVP Call Server is not reached.

The local IP IVR must be configured with appropriate .wav files and applications that can be called by the Unified CCE Child system locally to provide basic call treatment such as playing a welcome greeting or other message.

The Child CCE Routing Script must handle queueing of calls for agents in local skill groups, instructing the IP IVR to play treatment in-queue while waiting for an agent.

Any data lookup or external CTI access that is normally provided by the Parent CVP or the Parent Unified ICM must be provisioned locally to allow the agents to have full access to customer data for routing and screen pops.

Any post-routing transfer scripts will fail during this outage, so Unified CCE must be configured to handle this outage or prevent the post-route scripts from being accessed.

For Unified CCE Child configurations using local Unified CVP resources for queue and treatment with Unified CCE 7.5(x):

The local voice gateway must have dial peer statements to pass control of the calls to the local Unified CVP Call Server at the Child site. Also, the inbound DNIS or dialed numbers that the local voice gateway will present to the Child CVP must be configured in the Child CCE to process these calls locally at the Child.

The local VXML Gateways and CVP Call Servers must be configured with appropriate .wav files and applications that can be called by the Unified CCE Child system locally to provide basic call treatment such as playing a welcome greeting or other messages.

Self-service or CVP Studio VXML applications normally provided by the Parent ICM must be replicated using CVP VXML Server (web application server) at the Child to generate the dynamic VXML for these applications.

The Child CCE Routing Script must handle queueing of calls for agents in local skill groups, instructing the local Unified CVP at the Child to play treatment in-queue while waiting for an agent.

Any data lookup or external CTI access that is normally provided by the Parent CVP or the Parent Unified ICM must be provisioned locally to allow the agents to have full access to customer data for call routing and screen pops.

Any post-routing transfer scripts will fail during this outage, so Unified CCE must be configured to handle this outage or prevent the post-route scripts from being accessed.

Unified Contact Center Express Child Loses WAN Connection to Unified ICM Parent

If the WAN between the Unified Contact Center Express (CCX) child site and the Unified ICM parent fails, the local Unified CCX system will be isolated from the parent as well as the Unified CVP voice gateway. Calls coming into the site will no longer get treatment from the Unified CVP under control of the Unified ICM parent, so the following functionality must be replicated locally:

The local voice gateway must have dial peer statements to pass control of the calls to the local Unified CM cluster if the Parent CVP Call Server cannot be reached.

Unified CCX JTAPI applications have to be mapped to these CTI route points to provide any typical inbound call treatment, such as playing a welcome greeting or other message.

The application has to provide for call queueing and treatment in queue while waiting for a local Contact Service Queue (CSQ) agent.

Any data lookup or external CTI access that is normally provided by the Parent CVP or the Parent Unified ICM must be provisioned locally to allow the agents to have full access to customer data for call routing and screen pops.

Any post-routing applications or transfer scripts will fail during this outage, so the Unified CCX must be configured to handle this outage or prevent the post-route applications from being accessed.

A similar failure would occur if the local Unified CVP ingress voice gateways controlled by the Parent CVP Call Server could not see the Unified ICM Parent CVP Call Servers. The local Unified CVP gateways would be configured to fail-over to the local Unified CM (or Child CVP) to route calls to the Unified CCE agents as described above. Likewise, if the entire Unified ICM parent were to fail, the local voice gateways controlled by the Parent CVP at the sites would no longer have call control from the Unified ICM parent, and calls would forward to the local sites for processing.

Unified CCE Gateway PG Fails or Cannot Communicate with Unified ICM Parent

If the Unified CCE gateway PG fails or cannot communicate with the Unified ICM parent, the local agents are no longer seen as available to the Unified ICM parent, but the inbound calls to the site may still be under control of the Unified ICM parent CVP. In this case, the Unified ICM parent will not know if the remote Unified CCE gateway PG has failed or if the actual Unified CCE IP-ACD has failed locally.

The Unified ICM at the parent location can automatically route around this site, considering it down until the PG comes back online and reports agent states again. Alternatively, the Unified ICM can also direct a percentage of calls as blind transfers to the site Unified CCE IP-ACD using the local inbound CTI route points on the Unified CM. This method would present calls with no CTI data from the CVP, but it would allow the agents at the site to continue to get calls locally with their Unified CCE system.

If the local Unified CCE child system were to fail, the Unified CCE gateway PG would not be able to connect to it, and the Unified ICM parent would then consider all of the agents as being off-line and not available. If calls were sent to the local Unified CM while the child Unified CCE system was down, the call-forward-on-failure processing would take over the call for the CTI route point. This method would redirect the call to another site or an answering resource to play a message telling the caller there was an error and to call again later.

Parent/Child Reporting and Configuration Impacts

During any time that the Unified CCE child is disconnected from the Unified ICM parent, the local IP-ACD is still collecting reporting data and allows local users to make changes to the child routing scripts and configuration. The Unified CCE gateway PG at the child site will cache these objects and store them in memory (and eventually to disk) to be sent later to the Unified ICM parent when it is available. This functionality is available only if the Unified CCE gateway PG is co-located at the child Unified CCE site.

Other Considerations for the Parent/Child Model

Multi-channel components such as Cisco Email Manager, Web Collaboration or EIM/WIM, and Unified Outbound Option may be installed only at the child Unified CCE level, not at the parent. They are treated as nodal implementations on a site-by-site basis.

Other Considerations for High Availability

A Unified CCE failover can affect other parts of the solution. Although Unified CCE may stay up and running, some data could be lost during its failover, or other products that depend on Unified CCE to function properly might not be able to handle a Unified CCE failover. This section examines what happens to other critical areas in the Unified CCE solution during and after failover.

Reporting

The Unified CCE reporting feature uses real-time, five-minute and half-hour intervals to build its reporting database. Therefore, at the end of each five-minute and half-hour interval, each Peripheral Gateway will gather the data it has kept locally and send it to the Call Routers. The Call Routers process the data and send it to their local Logger and Database Servers for historical data storage. If the deployment has the Historical Data Server (HDS) option, that data is then replicated to the HDS from the Logger as it is written to the Logger database.

The Peripheral Gateways provide buffering (in memory and on disk) of the five-minute and half-hour data collected by the system to handle network connectivity failures or slow network response as well as automatic retransmission of data when the network service is restored. However, physical failure of both Peripheral Gateways in a redundant pair can result in loss of the half-hour or five-minute data that has not been transmitted to the Central Controller. Cisco recommends the use of redundant Peripheral Gateways to reduce the chance of losing both physical hardware devices and their associated data during an outage window.

When agents log out, all their reporting statistics stop. The next time the agents log in, their real-time statistics start from zero. Typically, Unified ICM Central Controller failover does not force the agents to log out or reset their statistics; however, if the PG fails-over, their agent statistics are reset because the PIM and OPC processes that maintain these values in memory are restarted. If the CTI OS or CAD servers do not fail-over or restart, the agent desktop functionality is restored back to its pre-failover state.

For further information, refer to the Reporting Guide for Cisco IPCC Enterprise & Hosted Editions, available at

http://www.cisco.com/en/US/products/sw/custcosw/ps4145/products_user_guide_list.html