Cisco Unified Contact Center Enterprise Solution Reference Network Design, Release 9.x
Design Considerations for High Availability
Downloads: This chapterpdf (PDF - 2.59MB) The complete bookPDF (PDF - 12.21MB) | The complete bookePub (ePub - 7.08MB) | Feedback

Design Considerations for High Availability

Contents

Design Considerations for High Availability


Note


Many of the design considerations and illustrations throughout this chapter have been revised and updated. Review the entire chapter before designing a Unified CCE system.


Designing for High Availability

Cisco Unified CCE is a distributed solution that uses numerous hardware and software components, and it is important to design each system in a manner that eliminates any single point of failure – or that at least addresses potential failures in a manner that impacts the fewest resources in the contact center. The type and number of resources impacted depends on how stringent your requirements are, the budget for fault tolerance, and which design characteristics you choose for the various Unified CCE components (including the network infrastructure). A good Unified CCE design tolerates most failures (defined later in this section); but not all failures can be made transparent.

  • Cisco Unified CCE is a solution designed for mission-critical contact centers. The successful design of any Unified CCE deployment requires a team with experience in data and voice internetworking, system administration, and Unified CCE application design and configuration.
  • Simplex deployments are allowed for demo, laboratory, and non-production deployments. However, all production deployments must be deployed with redundancy for the core Unified CCE components (Call Routers, Loggers, PGs, and pre-routing gateways).

Before implementing Unified CCE, use careful preparation and design planning to avoid costly upgrades or maintenance later in the deployment cycle. Always design for the worst possible failure scenario with future scalability in mind for all Unified CCE sites.

In summary, plan ahead and follow all the design guidelines presented in this guide and in the Cisco Unified Communications Solution Reference Network Design (SRND) Guide at http:/​/​www.cisco.com/​en/​US/​products/​sw/​voicesw/​ps556/​tsd_​products_​support_​series_​home.html.

For assistance in planning and designing your Unified CCE solution, consult your Cisco or certified Partner Systems Engineer (SE).

The figure below shows a high-level design for a fault-tolerant Unified CCE single-site deployment.

Figure 1. Unified CCE Single-Site Design for High-availability

In the figure above, each component in the Unified CCE solution is duplicated with a redundant or duplex component, with the exception of the intermediate distribution frame (IDF) switches for the Unified CCE agents and their phones. The IDF switches do not interconnect with each other but only with the main distribution frame (MDF) switches because it is better to distribute the agents among different IDF switches for load balancing and geographic separation (such as different building floors or different cities). If an IDF switch fails, route all calls to other available agents in a separate IDF switch or to a Unified IP IVR queue. Follow the design guidelines for a single-site deployment as documented in the Cisco Unified Communications Solution Reference Network Design (SRND) Guide at http:/​/​www.cisco.com/​en/​US/​products/​sw/​voicesw/​ps556/​tsd_​products_​support_​series_​home.html.

If designed correctly for high-availability and redundancy, a Unified CCE system can lose half of its core component systems or servers and still be operational. With this type of design, no matter what happens in the Unified CCE system, calls can still be handled in one of the following ways:

  • Routed and answered by an available Unified CCE agent using an IP phone or desktop soft phone
  • Sent to an available Unified IP IVR or Unified CVP port or session
  • Answered by the Cisco Unified Communications Manager AutoAttendant or Hunt Group
  • Prompted by a Unified IP IVR or Unified CVP announcement that the call center is currently experiencing technical difficulties and to call back later
  • Rerouted to another site with available agents or resources to handle the call

The components in the figure above can be rearranged to form two connected Unified CCE sites, as illustrated in the figure below.

Figure 2. Unified CCE Single-Site Redundancy

The figure above emphasizes the redundancy of the single site design in Figure 46. Side A and Side B are basically mirror images of each other. In fact, one of the main Unified CCE features to enhance high-availability is its capability to add redundant or duplex components that are designed to automatically fail-over and recover without any manual intervention. Core system components with redundant components are interconnected to provide failure detection of the redundant system component with the use of TCP keep-alive messages generated every 100 ms over a separate Private Network path. The fault-tolerant design and failure detection and recovery method is described later in this chapter.

Other components in the solution use other types of redundancy strategies. For example, Cisco Unified Communications Manager (Unified CM) uses a cluster design that provides IP phones and devices with multiple Unified CM subscribers (servers) to register with when the primary server fails. The devices automatically reconnect to the primary server when it is restored.

The following sections use Figure 46 as the model design to discuss issues and features to consider when designing Unified CCE for high availability. These sections use a bottom-up model (from a network model perspective, starting with the physical layer first) that divides the design into segments that can be deployed in separate stages.

Use only duplex (redundant) Unified CM, Unified IP IVR or Unified CVP, and Unified CCE components for all Unified CCE deployments. This chapter assumes that the Unified CCE fail-over feature is a critical requirement for all deployments; therefore it presents only deployments that use a redundant configuration with each Unified CM cluster having at least one publisher and one subscriber. Additionally, where possible, deploy Unified CCE so that no devices, call processing, or CTI Manager Services are running on the Unified CM publisher.

Data Network Design Considerations

The Unified CCE design shown in the figure below illustrates the voice call path from the PSTN (public switched telephone network) at the ingress Voice Gateway to the call reaching a Unified CCE agent. The network infrastructure in the design supports the Unified CCE environment for data and voice traffic. The network, including the PSTN, is the foundation for the Unified CCE solution. If the network is poorly designed to handle failures, then everything in the contact center is prone to failure because all the servers and network devices depend on the network for highly available communications. The data and voice networks must be a primary part of your solution design and must be addressed in the early stages for all Unified CCE implementations.

Set the NIC card and Ethernet switch to 100 MB full duplex for 10/100 links, or set them to auto-negotiate for gigabit links for all the Unified CCE core component servers.

In addition, the choice of Voice Gateways for a deployment is critical because some protocols offer more call resiliency than others. This chapter provides high-level information about how to configure the voice gateways for high-availability with the Unified CCE solution.

For more information about Voice Gateways and voice networks in general, see the Cisco Unified Communications Solution Reference Network Design (SRND) Guide at http:/​/​www.cisco.com/​en/​US/​products/​sw/​voicesw/​ps556/​tsd_​products_​support_​series_​home.html.

Figure 3. High-availability in a Network with Two Voice Gateways and One Unified CM Cluster

The use of multiple Voice Gateways avoids the problem of a single gateway failure causing blockage of all inbound and outgoing calls. In a configuration with two Voice Gateways and one Unified CM cluster, register each gateway with a different primary Unified CM subscriber to spread the workload across the subscribers in the cluster. Configure each gateway to use another subscriber as a backup in case its primary fails. For details on setting up Unified CM for redundant service and redundancy groups related to call processing, see the Cisco Unified Communications Solution Reference Network Design (SRND) Guide at http:/​/​www.cisco.com/​en/​US/​products/​sw/​voicesw/​ps556/​tsd_​products_​support_​series_​home.html.

With Cisco IOS Voice Gateways using H.323 or SIP, additional call processing is available by using TCL scripts and additional dial peers if the gateway is unable to reach its Unified CM for call control or call processing instructions. MGCP gateways do not have this built-in functionality and the trunks that are terminated in these gateways require backup routing or "roll-over service" from the PSTN carrier or service provider to reroute the trunk on failure or no-answer to another gateway or location.

As for sizing the gateway's trunk capacity, it is a good idea to account for fail-over of the gateways by building in enough excess capacity to handle the maximum busy hour call attempts (BHCA) if one or more Voice Gateways fail. During the design phase, first decide how many simultaneous Voice Gateway failures are possible and acceptable for the site. Based on this requirement, the number of Voice Gateways used, and the distribution of trunks across those Voice Gateways; you can determine the total number of trunks required for normal and disaster modes of operation. The more you distribute the trunks over multiple Voice Gateways, the fewer trunks you need in a failure mode. However, using more Voice Gateways or carrier PSTN trunks increases the cost of the solution, so compare the cost with the benefits of being able to service calls in a gateway failure. The form-factor of the gateway is also a consideration.

As an example, assume a contact center has a maximum BHCA that results in the need for four T1 lines and the company has a requirement for no call blockage in the event of a single component (Voice Gateway) failure. If two Voice Gateways are deployed, then provision each Voice Gateway with four T1 lines (a total of eight). If three Voice Gateways are deployed, then two T1 lines per Voice Gateway (a total of six) are enough to achieve the same level of redundancy. If five Voice Gateways are deployed, then one T1 per Voice Gateway (a total of five) are enough to achieve the same level of redundancy. Thus, you can reduce the number of T1 lines required by adding more Voice Gateways and spreading the risk over multiple physical devices.

The operational cost savings of fewer T1 lines might be greater than the one-time capital cost of the additional Voice Gateways. In addition to the recurring operational costs of the T1 lines, also factor in the carrier charges (like the typical one-time installation cost) of the T1 lines to ensure that your design accounts for the most cost-effective solution. Every installation has different availability requirements and cost metrics, but using multiple Voice Gateways is often more cost-effective. It is a worthwhile design practice to perform this cost comparison.

After you have determined the number of trunks needed, the PSTN service provider has to configure them so that calls can be terminated onto trunks connected to all of the Voice Gateways (or at least more than one Voice Gateway). From the PSTN perspective, if the trunks going to the multiple Voice Gateways are configured as a single large trunk group, then all calls are automatically routed to the surviving Voice Gateways when one Voice Gateway fails. If all of the trunks are not grouped into a single trunk group within the PSTN, then you must ensure that PSTN rerouting or overflow routing to the other trunk groups is configured for all dialed numbers.

If a Voice Gateway with a digital interface (T1 or E1) fails, then the PSTN automatically stops sending calls to that Voice Gateway because carrier level signaling on the digital circuit has dropped. The loss of carrier level signaling on a digital circuit causes the PSTN to busy-out all trunks, thereby preventing the PSTN from routing new calls to the failed Voice Gateway. When the failed Voice Gateway comes back on-line and the circuits are back in operation, the PSTN automatically starts delivering calls to that Voice Gateway again.

With Cisco IOS Voice Gateways using H.323 or SIP, it is possible for the Voice Gateway itself to be operational but for its communication paths to the Unified CM servers to be severed (for example, a failed Ethernet connection). If this situation occurs, you can use the busyout-monitor interface command to monitor the Ethernet interfaces on a Voice Gateway. To place a voice port into a busyout monitor state, use the busyout-monitor interface voice-port configuration command. To remove the busyout-monitor state on the voice port, use the no form of this command.

As noted previously, these gateways also provide additional processing options if the call control interface is not available from Unified CM to reroute the calls to another site or dialed number or to play a locally stored .wav file to the caller and end the call.

With MGCP-controlled Voice Gateways, when the Voice Gateway interface to Unified CM fails, the gateway look for secondary and tertiary Unified CM subscribers from the redundancy group. The MGCP gateway will automatically fail-over to the other subscribers in the group and periodically check the health of each, marking it as available once it comes back on-line. The gateway will then fail-back to the primary subscriber when all calls are idle or after 24 hours (whichever comes first).

If no subscribers are available, the Voice Gateway automatically busies-out all its trunks. This action prevents new calls from being routed to this Voice Gateway from the PSTN. When the Voice Gateway interface to Unified CM homes to the backup subscriber, the trunks are automatically idled and the PSTN begins routing calls to this Voice Gateway again (assuming the PSTN has not permanently busied-out those trunks). The design practice is to spread the gateways across the Unified CM call processing servers in the cluster to limit the risk of losing all the gateway calls in a call center if the primary subscriber that has all the gateways registered to it fails.

Voice gateways that are used with the Cisco Unified Survivable Remote Site Telephony (SRST) option for Unified CM follow a similar fail-over process. If the gateway is cut off from the Unified CM that is controlling it, the gateway will fail-over into SRST mode, which drops all voice calls and resets the gateway into SRST mode. Phones re-home to the local SRST gateway for call control and calls are processed locally and directed to local phones.

While running in SRST mode, it is assumed that the agents also have no CTI connection from their desktops. They are seen as not ready within the Unified CCE routing application and no calls are sent to these agents by Unified CCE. When the data connection is re-established to the gateway at the site, the Unified CM takes control of the gateway and phones again, allowing the agents to be reconnected to the Unified CCE.

Unified CM and CTI Manager Design Considerations

Cisco Unified CM uses CTI Manager, a service that acts as an application broker and abstracts the physical binding of the application to a particular Unified CM server, to handle all its CTI resources. (See the Cisco Unified Communications Solution Reference Network Design (SRND) Guide at http:/​/​www.cisco.com/​en/​US/​products/​sw/​voicesw/​ps556/​tsd_​products_​support_​series_​home.html for further details about the architecture of the CTI Manager.) The CTI Manager and Unified CM are two separate services running on a Unified CM server. Some other services running on a Unified CM server include TFTP, Cisco Messaging Interface, and Real-time Information Server (RIS) data collector services.

The main function of the CTI Manager is to accept messages from external CTI applications and send them to the appropriate resource in the Unified CM cluster. The CTI Manager uses the Cisco JTAPI link to communicate with the applications. It acts like a JTAPI messaging router. The JTAPI client library in Cisco Unified CM connects to the CTI Manager instead of connecting directly to the CallManager service. In addition, there can be multiple CTI Manager services running on different Unified CM servers in the cluster that are aware of each other (by way of the CallManager service, which is explained later in this section). The CTI Manager uses the same Signal Distribution Layer (SDL) signaling mechanism that the Unified CM services in the cluster use to communicate with each other. However, the CTI Manager does not directly communicate with the other CTI Managers in its cluster. (This is also explained later in detail.)

The main function of the CallManager service is to register and monitor all the Cisco Unified Communications devices. It basically acts as a switch for all the Cisco Unified Communications resources and devices in the system while the CTI Manager service acts as a router for all the CTI application requests for the system devices. Some of the devices that can be controlled by JTAPI that register with the CallManager service include the IP phones, CTI ports, and CTI route points.

The figure below illustrates some of the functions of Unified CM and the CTI Manager.

Figure 4. Functions of the CallManager and CTI Manager Services

The servers in a Unified CM cluster communicate with each other using the Signal Distribution Layer (SDL) service. SDL signaling is used only by the CallManager service to talk to the other CallManager services to make sure everything is in sync within the Unified CM cluster. The CTI Managers in the cluster are completely independent and do not establish a direct connection with each other. CTI Managers route only the external CTI application requests to the appropriate devices serviced by the local CallManager service on this subscriber. If the device is not resident on its local Unified CM subscriber, then the CallManager service forwards the application request to the appropriate Unified CM in the cluster. The figure below shows the flow of a device request to another Unified CM in the cluster.

Figure 5. CTI Manager Device Request to a Remote Unified CM

Although it might be tempting to register all of the Unified CCE devices to a single subscriber in the cluster and point the Peripheral Gateway (PG) to that server, this configuration puts a high load on that subscriber. If the PG were to fail in this case, the duplex PG would connect to a different subscriber and all the CTI Manager messaging would have to be routed across the cluster to the original subscriber. It is important to distribute devices and CTI applications appropriately across all the call processing nodes in the Unified CM cluster to balance the CTI traffic and limit possible failover conditions.

The external CTI applications use a CTI-enabled user account in Unified CM. They log into the CTI Manager service to establish a connection and assume control of the Unified CM devices associated to this specific CTI-enabled user account, typically referred to as the JTAPI user or PG user. In addition, given that the CTI Managers are independent from each other, any CTI application can connect to any CTI Manager in the cluster to perform its requests. However, because the CTI Managers are independent, one CTI Manager cannot pass the CTI application to another CTI Manager upon failure. If the first CTI Manager fails, the external CTI application must implement the fail-over mechanism to connect to another CTI Manager in the cluster.

For example, the Agent PG handles fail-over for the CTI Manager by using its duplex servers, Sides A and B, each of which is pointed to a different subscriber in the cluster, and by using the CTI Manager on those subscribers. It is important to note these connections from the PG are managed in hot standby mode, which means that only one side of the PG is active at any given time and is connected to the CTI Manager on the subscriber.

The PG processes are designed to prevent both sides from trying to be active at the same time to reduce the impact of the CTI application on Unified CM. Additionally, both of the duplex PG servers (Side A and Side B) use the same CTI-enabled JTAPI or PG user to log into the CTI Manager applications. However, only one Unified CM PG Side Allows the JTAPI user to register and monitor the user devices to conserve system resources in the Unified CM cluster. The other side of the Unified CM PG stays in hot-standby mode, waiting to connect, log in, register, and be activated upon failure of the active side.

The figure below shows two external CTI applications using the CTI Manager, the Agent PG, and the Unified IP IVR. The Unified CM PG logs into the CTI Manager using the JTAPI account User 1 while the Unified IP IVR uses account User 2. Each external application uses its own specific JTAPI user account and have different devices registered and monitored by that user. For example, the Unified CM PG (User 1) monitors all four agent phones and the inbound CTI Route Points, while the Unified IP IVR (User 2) monitors its CTI Ports and the CTI Route Points used for its JTAPI Triggers. Although multiple applications can monitor the same devices, avoid this method because it can cause race conditions between the applications trying to take control of the same physical device.

Figure 6. CTI Application Device Registration

Unified CM CTI applications also add to the device weights on the subscribers, adding memory objects used to monitor registered devices. These monitors are registered on the subscriber that has the connection to the external application. It is a good design practice to distribute these applications to CTI Manager registrations across multiple subscribers to avoid overloading a single subscriber with all of the monitored object tracking.

Perform the design of Unified CM and CTI Manager as the second design stage, right after the network design stage. Perform deployment in this same order. The reason for this order is that the Cisco Unified Communications infrastructure must be in place to dial and receive calls using its devices before you can deploy any telephony applications.

Before moving to the next design stage, make sure that a PSTN phone can call an IP phone and that this same IP phone can dial out to a PSTN phone with all the call survivability capabilities considered for treating these calls. Also keep in mind that the Unified CM cluster design is paramount to the Unified CCE system, and any server failure in a cluster takes down two services (CTI Manager and Unified CM), thereby adding an extra load to the remaining servers in the cluster.

Configuring the Unified CCE Peripheral Gateway for CTI Manager Redundancy

To enable Unified Communications Manager support for CTI Manager fail-over in a duplex Unified CCE Peripheral Gateway model, perform the following steps:
Procedure
    Step 1   Create a Unified Communications Manager redundancy group and add subscribers to the group. (Do not use Publishers and TFTP servers for call processing, device registration, or CTI Manager functions.)
    Step 2   Designate two CTI Managers on different subscribers to be used for each side of the duplex Peripheral Gateway (PG), one for PG Side A and one for PG Side B.
    Step 3   Assign one of the CTI Managers to be the JTAPI service of the Unified Communications Manager PG Side A. Note that the setup panel on the left is for Side A of the Peripheral Gateway. It points to the CCM1 subscriber and uses the PGUser CTI-enabled user account on the Unified Communications Manager cluster.
    Figure 7. Assigning CTI Managers for PG Sides A and B

    Step 4   Assign the second CTI Manager to be the JTAPI service of the Unified Communications Mananger PG Side B. Note that the setup panel on the right is for Side B of the Peripheral Gateway. It points to the CCM2 subscriber and uses the same PGUser CTI-enabled user account on the Unified Communications Manager cluster. Both sides of the duplex PG pair must use the same JTAPI user to monitor the same devices from either side of the PG pair.

    Unified IP IVR Design Considerations

    The JTAPI subsystem in Unified IP IVR can establish connections with two CTI Managers on different subscribers in the Unified CM cluster. This feature enables Unified CCE designs to add Unified IP IVR redundancy at the CTI Manager level, such as the Unified CCE Peripheral Gateway connections. Additionally, deploy multiple, redundant IP-IVR servers in the design and allow the Unified CCE call routing script to load-balance calls automatically between the available IP-IVR resources.

    The figure below shows two Unified IP IVR servers configured for redundancy within one Unified CM cluster. Configure the Unified IP IVR group so that each server is connected to a different CTI Manager service on different Unified CM subscribers in the cluster for high availability. Using the redundancy feature of the JTAPI subsystem in the Unified IP IVR server, you can implement redundancy by adding the IP addresses or host names of two Unified CMs from the cluster. Then, if one of the Unified CMs fails, the Unified IP IVR associated with that particular Unified CM will fail-over to the second Unified CM.

    Figure 8. High-availability with Two Unified IP IVR Servers and One Unified CM Cluster

    Unified IP IVR High-availability Using Unified CM

    You can implement Unified IP IVR port high-availability by using any of the following call-forward features in Unified CM:
    • Forward Busy — forwards calls to another port or route point when Unified CM detects that the port is busy. This feature can be used to forward calls to another resource when a Unified IP IVR CTI port is busy due to a Unified IP IVR application problem, such as running out of available CTI ports.
    • Forward No Answer — forwards calls to another port or route point when Unified CM detects that a port has not picked up a call within the timeout period set in Unified CM. This feature can be used to forward calls to another resource when a Unified IP IVR CTI port is not answering due to a Unified IP IVR application problem.
    • Forward on Failure — forwards calls to another port or route point when Unified CM detects a port failure caused by an application error. This feature can be used to forward calls to another resource when a Unified IP IVR CTI port is busy due to a Unified CM application error.
    • When using the call forwarding features to implement high-availability of Unified IP IVR ports, avoid creating a loop in the event that all the Unified IP IVR servers are unavailable. Basically, do not establish a path back to the first CTI port that initiated the call forwarding.

    Unified IP IVR High-availability Using Unified CCE Call Flow Routing Scripts

    You can implement Unified IP IVR high-availability through Unified CCE call flow routing scripts. You can prevent calls from queuing to an inactive Unified IP IVR by using the Unified CCE scripts to check the Unified IP IVR Peripheral Status before sending the calls to it. For example, you can program a Unified CCE script to check if the Unified IP IVR is active by using an IF node or by configuring a Translation Route to the Voice Response Unit (VRU) node (use the consider if field) to select the Unified IP IVR with the most idle ports to distribute the calls evenly on a call-by-call basis. This method can be modified to load-balance ports across multiple Unified IP IVRs and it can address all of the Unified IP IVRs on the cluster in the same Translation Route or Send to VRU node.


    Note


    All calls at the Unified IP IVR are dropped if the Unified IP IVR server itself fails. It is important to distribute calls across multiple Unified IP IVR servers to minimize the impact of such a failure. In Unified IP IVR), there is a default script to handle cases where the Unified IP IVR loses the link to the IVR Peripheral Gateway so that the calls are not lost.


    Cisco Unified Customer Voice Portal (Unified CVP) Design Considerations

    The Unified CVP can be deployed with Unified CCE as an alternative to Unified IP IVR for call treatment and queuing. Unified CVP is different from Unified IP IVR in that it does not rely on Unified CM for JTAPI call control. Unified CVP uses H.323 or SIP for call control and is used in front of Unified CM or other PBX systems as part of a hybrid Unified CCE or migration solution.

    Figure 9. High-availability with Two Unified CVP Call Control Servers Using H.323

    Unified CVP uses the following system components:

    • Cisco Voice Gateway The Cisco Voice Gateway is typically used to terminate TDM PSTN trunks and calls to transform them into IP-based calls on an IP network. Unified CVP uses specific Cisco IOS Voice Gateways that support H.323 and SIP to enable more flexible call control models outside of the Unified CM MGCP control model. H.323 and SIP protocols enable Unified CVP to integrate with multiple IP and TDM architectures for Unified CCE. Voice gateways controlled by Unified CVP also provide additional functionality using the Cisco IOS built-in Voice Extensible Markup Language (VoiceXML) Browser to provide caller treatment and call queuing on the Voice Gateway without having to move the call to a physical device such as the IP-IVR or a third-party IVR platform. Unified CVP can also leverage the Media Resource Control Protocol (MRCP) interface of the Cisco IOS Voice Gateway to add automatic speech recognition (ASR) and text-to-speech (TTS) functions on the gateway under Unified CVP control.
    • Unified CVP Call Server The Unified CVP Call Server provides call control signaling when calls are switched between the ingress gateway and another endpoint gateway or a Unified CCE agent. It also provides the interface to the Unified CCE VRU Peripheral Gateway and translates specific Unified CCE VRU commands into VoiceXML code that is rendered on the Unified CVP Voice Gateway. The Call Server can communicate with the gateways using H.323 or SIP as part of the solution.
    • Unified CVP Media Server The Unified CVP caller treatment is provided either by using ASR/TTS functions through MRCP or with predefined .wav files stored on media servers. The media servers act as web servers and serve up the .wav files to the voice browsers as part of their VoiceXML processing. Media servers can be clustered using the Cisco Content Services Switch (CSS) products allowing multiple media servers to be pooled behind a single URL for access by all the voice browsers in the network.
    • Unified CVP VXML Application Server Unified CVP provides a VoiceXML service creation environment using an Eclipse toolkit browser which is hosted in the Unified CVP Call Studio Application. The Unified CVP VXML server hosts the Unified CVP VoiceXML runtime environment where the dynamic VoiceXML applications are executed and Java and Web Services calls are processed for external systems and database access.
    • H.323 Gatekeepers Gatekeepers are used with Unified CVP to register the voice browsers and associate them with specific dialed numbers. When a call comes into the network, the gateway will query the gatekeeper to find out where to send the call based on the dialed number. The gatekeeper is also aware of the state of the voice browsers and will load-balance calls across them to avoid sending calls to out-of-service voice browsers or ones that have no available sessions.
    • SIP Proxy ServersSIP Proxy Servers are used with Unified CVP to select voice browsers and associate them with specific dialed numbers. When a call comes into the network, the gateway queries the SIP Proxy Server to find out where to send the call based on the dialed number.
    Availability of Unified CVP can be increased by the following methods:
    • Adding redundant Unified CVP Call Servers under control of the Unified CCE Peripheral Gateways allows calls to be balanced automatically across multiple Unified CVP Call Servers.
    • Adding TCL scripts to the Unified CVP gateway to handle conditions where the gateway cannot contact the Unified CVP Call Server to direct the call correctly.
    • Adding gatekeeper redundancy with HSRP or gatekeeper clustering in H.323.
    • Adding a Cisco Content Server to load-balance .wav file requests across multiple Unified CVP Media Servers and VoiceXML URL access across multiple servers.
    • Calls in Unified CVP are not dropped if the Unified CVP Call Server or Unified CVP PG fails because they can be redirected to another Unified CVP Call Server on another Unified CVP-controlled gateway as part of the fault-tolerant design using TCL scripts (which are provided with the Unified CVP images) in the Voice Gateway.

    For more information about these options, review the Unified CVP product documentation.

    Cisco Multichannel Options with the CIM: E-mail Interaction Manager and Web Interaction Manager

    Cisco Interaction Manager (CIM) platform is a single application that provides both E-mail and Web interaction management using a common set of web servers and pages for agents and administrators. The Cisco Interaction Manager (CIM) platform is a single application that provides both E-mail and Web interaction management. CIM uses a common set of web servers and pages for agents and administrators. CIM integrates with the Unified CCE platform to provide universal queuing of contacts to agents from different media channels.

    For additional design information about the Interaction Manager platform, see the Cisco Unified Web and E-Mail Interaction Manager Solution Reference Network Design (SRND) Guide at http:/​/​www.cisco.com/​en/​US/​products/​ps7236/​tsd_​products_​support_​series_​home.html.

    Cisco Interaction Manager Architecture Overview

    The Cisco Interaction Manager has several core components, as illustrated below.

    Figure 10. Cisco Interaction Manager Architecture

    The architecture is defined by a multi-tiered model, with various components at each of the following levels of the design.

    External Clients

    Cisco Interaction Manager is a 100% web-based product that agents and end-customers can access using a web browser from their desktops.

    Agents can access the application using Microsoft Internet Explorer 6.0 or the embedded CAD browser, and customers can access the chat customer console using specific versions of Microsoft IE, Mozilla, Firefox, or Netscape. Cisco Interaction Manager is not supported on agent desktops running in a Citrix terminal services environment.

    Tier 0: Firewall and Load Balancer

    Agents and customers connect to the application from their respective browsers through a firewall, if so configured for the application.

    A load balancer may also be used in case of a distributed installation of the application so that requests from agents and customers are routed to the least-loaded web servers.

    Tier 1: Web Server

    The web server is used to serve static content to the browser. Cisco Interaction Manager is designed to be indifferent to the specific type of web Server Being used, with the single requirement being that the application server vendor must provide a web server plug-in for the corresponding application server.

    Tier 2: Application and File Server

    The application server is used as a web container (also known as the JSP or Servlet engine) and EJB Container. The core business logic resides in the Business Object Layer and stored procedures reside on the database server. The business logic residing in JAVA classes is deployed on the application server. The JSP or Servlets interact with the business objects through the business client layer and these in turn interact with the database to execute some business logic on data present in the database server.

    Example: Outbound Task Creation

    • A user logs in to the application and creates an outbound task.
    • The JSP layer calls the Business Client layer which interacts with Business Objects residing in the same application server where JSPs or Servlets are deployed.
    • The Business Objects execute queries and stored procedures residing on the database server.
    • Activities are created and stored in database tables.
    • The file server is used for storing all email and article attachment files, report templates, and all locale-specific strings used in the application.

    Tier 3: Services Server

    Cisco Interaction Manager has processes that perform specific business functions such as fetching e-mail from a POP server, sending e-mail to an SMTP server, processing workflows, assigning chats to the agents, and so forth. All services run on the Services Server and are managed by the Distributed Service Manager (DSM).

    Cisco Interaction Manager facilitates the creation of multiple instances of services with work distributed among the various instances. For example, the service used to retrieve e-mail may be configured to have multiple instances to retrieve e-mail from different email addresses. This capability can be used to process increasing volumes of customer interactions coming into a contact center.

    Data Tier: Database Server

    The data tier includes databases that are SQL-compliant, HTML/XML data-sources, and ultimately Web services that consume and produce SOAP messages. Business objects and data adapters use this layer to extract data from various third-party applications and data sources. This layer also deals with HTML and XML parsing using relevant J2EE-compliant packages to process data in other formats.

    Unified CCE Integration

    As part of the system integration with Unified CCE, the services server consists of two additional services, the EAAS and the Listener Service, which interact with the Media Routing (MR) PG and Agent PG components of Unified CCE respectively through the Media Routing (MR) and Agent Resource Management (ARM) interfaces.

    Additionally, the application server of Cisco Interaction Manager establishes a connection with the Unified CCE Administration & Data server to import relevant configuration data and to map the configuration to Cisco Interaction Manager objects in the Cisco Interaction Manager database. Note that Cisco Interaction Manager does not make use of the Configuration API (ConAPI) interface.

    For certain deployments of Unified CCE, the Media Routing (MR) PG of Unified CCE can reside on the services server.

    In parent/child configurations, there is no multichannel routing and integration through the parent Unified ICM. Media Routing PGs need to connect to the child Unified CCE. A separate Cisco Interaction Manager or partition is required for each child.

    Likewise, in hosted Unified ICM/CCH environments, there is no multichannel routing through the Network Application Manager (NAM) layer and integration is at the individual Customer ICM (CICM) level only. The Media Routing (MR) PGs need to connect to the CICM.

    High-availability Considerations for Cisco Interaction Manager

    The Cisco Interaction Manager offers high-availability options using additional web and application servers and by using load balancing equipment to distribute agents and contact work more evenly across the platform. This also provides for fail-over in duplex or redundant models.

    Load Balancing Considerations

    The web service component of a Cisco Interaction Manager deployment can be load balanced to serve a large number of agents accessing the application at the same time. The web (or Web and Application) servers can be configured behind the load balancer with a Virtual IP and an agent can access Cisco Interaction Manager through Virtual IP. Depending on the selected load-balancing algorithm, the load balancer sends a request to one of the web and application servers behind it and send a response back to the agent. In this way, from a security perspective, the load balancer serves as a reverse proxy server too.

    One of the most essential parameters for configuring a load balancer is to configure it to support sticky sessions with cookie-based persistence. After every scheduled maintenance task, before access is opened for users, verify that all web and application servers are available to share the load. In the absence of this, the first web and application server may be overloaded due to the sticky connection feature. With other configurable parameters, you can define a load-balancing algorithm to meet various objectives such as equal load balancing, isolation of the primary web and application server, or sending fewer requests to a low-powered web and application server.

    The load balancer monitors the health of all web and application servers in the cluster. If a problem is observed, the load balancer removes the given web and application server from the available pool of servers, preventing new web requests from being directed to the problematic server.

    Managing Fail-over

    Cisco Interaction Manager supports clustered deployments. This ensures high-availability and performance through transparent replication, load balancing, and fail-over. The following key methods are available for handling failure conditions within a Cisco Interaction Manager and Unified CCE integrated deployment:
    • Implementing multiple Web and Application servers – If the primary server goes down, the load balancer can help handle the failure through routing requests to alternate servers. The load balancer detects application server failure and redirects requests to another application server, after which a new user session is created and users have to log in again to the Cisco Interaction Manager.
    • Allowing servers to be dynamically added or removed from the online cluster to accommodate external changes in demand or internal changes in infrastructure.
    • Allowing Cisco Interaction Manager services to fail-over with duplexed Unified CCE components (for example, MR PIM and Agent PIM of the MR PG and Agent PG, respectively) to eliminate downtime of the application in failure circumstances.
    The single points of failure in Cisco Interaction Manager include the following.
    • The JMS server going down
    • The Services server going down
    • The Database server going down

    Cisco Unified Outbound Option Design Considerations

    The Cisco Unified Outbound Option provides the ability for Unified CCE to place calls on behalf of agents to customers based on a predefined campaign. The major components of the Unified Outbound Option (shown in the figure below) are:
    • Outbound Option Campaign Manager—A software module that manages the dialing lists and rules associated with the calls to be placed. This software is loaded on the Logger Side A platform and is not redundant; it can be loaded and active only on the Logger A of the duplexed pair of Loggers in the Unified CCE system.
    • Outbound Option Dialer—A software module that performs the dialing tasks on behalf of the Campaign Manager. In Unified CCE, the Outbound Option Dialer emulates a set of IP phones for Unified CM to make the outbound calls and it detects the called party and manages the interaction tasks with the CTI OS server to transfer the call to an agent. It also interfaces with the Media Routing Peripheral Gateway. Each Dialer has its own peripheral interface manager (PIM) on the Media Routing Peripheral Gateway.
    • Media Routing Peripheral Gateway—A software component that is designed to accept route requests from non-inbound voice systems such as the Unified Outbound Option or the Multichannel products. In the Unified Outbound Option solution, each Dialer communicates with its own peripheral interface manager (PIM) on the Media Routing Peripheral Gateway.
    Figure 11. Unified CCE Unified Outbound Option

    The system can support multiple dialers across the enterprise, all of which are under control of the central Campaign Manager software.

    For the new SIP Dialer introduced in Unified CCE Release 8.0, Dialers operate in a warm standby mode similar to the PG fault tolerance model. For more details, see Outbound Option Description.

    For the pre-existing SCCP Dialers, although they do not function as a redundant or duplexed pair the way a Peripheral Gateway does, with a pair of dialers under control of the Campaign Manager, a failure of one of the dialers can be handled automatically and calls will continue to be placed and processed by the surviving dialer. Any calls that were already connected to agents remain connected and agents experience no impact from the failure.

    In all deployments, the Dialers are co-resident on the Unified CCE Peripheral Gateway for Unified CM.

    Guidelines for high availability:
    • Deploy the Media Routing Peripheral Gateways in duplex pairs.
    • Deploy multiple Dialers with one on each side of the Duplex Unified CCE Peripheral Gateway and make use of them in the Campaign Manager to allow for automatic fault recovery to a second Dialer in the event of a failure. For the SCCP Dialer, there are two options with multiple Dialers: a second Dialer can be configured with the same number of ports (100% redundancy), or the ports can be split across the two Dialers since they operate independently and are both active at the same time. In designs with a small number of Dialer ports, splitting them can impact the performance of the campaign.
    • Deploy redundant Voice Gateways for outbound dialing to ensure that the dialers have enough trunks available to place calls in the event of a Voice Gateway failure. In some instances where outbound is the primary application, these gateways are dedicated to outbound calling only.

    Peripheral Gateway Design Considerations

    The Agent PG uses the Unified CM CTI Manager process to communicate with the Unified CM cluster with a single Peripheral Interface Manager (PIM) controlling agent phones and CTI route points anywhere in the cluster. The Peripheral Gateway PIM process registers with CTI Manager on one of the Unified CM servers in the cluster and the CTI Manager accepts all JTAPI requests from the PG for the cluster. If the phone, route point, or other device that is being controlled by the PG is not registered to that specific Unified CM server in the cluster, the CTI Manager forwards that request to the other Unified CM servers in the cluster using Unified CM SDL links. There is no need for a PG to connect to multiple Unified CM servers in a cluster.

    Multiple PIM Connections to a Single Unified CM Cluster

    Although the Agent PG in this document is described as typically having only one PIM process that connects to the Unified CM cluster, the Agent PG can manage multiple PIM interfaces to the same Unified CM cluster. It can be used to create additional peripherals within Unified CCE for two purposes:
    • Improving Fail-over Recovery for Customers with Large Numbers of CTI Route Points
    • Scaling the Unified CCE PG Beyond 2,000 Agents per Server

    Improving Fail-over Recovery for Customers with Large Numbers of CTI Route Points

    When a Unified CCE PG fails-over, the PIM connection that was previously controlling the Unified CM cluster is disconnected from its CTI Manager and the duplex or redundant side of the PG will attempt to connect its PIM to the cluster using a different CTI Manager and Subscriber. This process requires the new PIM connection to register for all of the devices (phones, CTI Route Points, CTI Ports, and so forth) that are controlled by Unified CCE on the cluster. When the PIM makes these registration requests, all of them must be confirmed by Unified CM before the PIM can go into an active state and process calls.

    To help recover more quickly, the Unified CCE PG can have a PIM created that is dedicated to the CTI Route Points for the customer, thus allowing this PIM to register for these devices at a rate of approximately five per second and allowing the PIM to activate and respond to calls hitting these CTI Route points faster than if the PIM had to wait for all of the route points, then all the agent phones, and all the CTI ports.

    This dedicated CTI Route Point PIM can become active several minutes sooner and direct new inbound calls to queuing or treatment resources while waiting for the Agent PIM with the phones and CTI Ports to complete the registration process and become active.

    This does not provide any additional scaling or other benefits for the design; the only purpose is to allow Unified CM to have the calls on the CTI Route Points serviced faster by this dedicated PIM. Use this only with customers who have more than 250 Route Points because anything less does not provide a reasonable improvement in recovery time. Additionally, associate only the CTI Route Points that are serviced by Unified CCE with this PIM and provide it with its own dedicated CTI-Enabled JTAPI or PG user that is specific to the CTI Route Point PIM.

    Scaling the Unified CCE PG Beyond 2,000 Agents per Server

    In Unified CCE, multiple PIMs in the same physical PG server may be used to connect either to the same Unified CM cluster or to a second Unified CM cluster. This design reduces the physical number of PG servers required in the Unified CCE design. This is different from the recovery strategy for multiple PIMs because both of these PIMs are configured with up to 2,000 concurrent agents and their related CTI Route Points and CTI Ports as needed to support those agents. The additional PIM creates another Peripheral from the Unified CCE perspective, which might impact routing and reporting. Additionally, agent teams and supervisors cannot cross peripherals, so careful consideration must be given to which agent groups are allocated to each PIM and Peripheral in such a design.

    In designs where Unified CCE is deployed with Unified CVP, the Cisco Unified Communications Sizing Tool might show that the Unified CM cluster can support more than 2,000 total agents; however, the CTI Manager and JTAPI interfaces are tested and supported with a maximum of only 2,000 agents. To allow for a design with a single Unified CM cluster with more than 2,000 agents, a second Agent PIM is configured to support the additional agents (up to a total of 4,000 agents per PG).

    The following figure illustrates a single Unified CCE PG with two different PIMs pointing to the same Unified CM cluster.
    Figure 12. Two PIMs Configured to the Same Unified CM Cluster


    Note


    Use the Cisco Unified Communications Sizing Tool (Unified CST) to size the Unified CM cluster properly for Unified CCE. This tool is only available to Cisco partners and employees with proper login authentication.


    Redundant or Duplex Unified CCE Peripheral Gateway Considerations

    Unified CCE Agent PGs are deployed in a redundant or duplex configuration because the PG has only one connection to the Unified CM cluster using a single CTI Manager. If that CTI Manager were to fail, the PG is no longer able to communicate with the Unified CM cluster. Adding a redundant or duplex PG allows Unified CCE to have a second pathway or connection to the Unified CM cluster using a second CTI Manager process on a different Unified CM server in the cluster.

    The minimum requirement for Unified CCE high-availability support for CTI Manager and Unified IP IVR is a duplex (redundant) Agent PG environment with one Unified CM cluster containing at least two subscribers. Therefore, the minimum configuration for a Unified CM cluster in this case is one publisher and two subscribers. This minimum configuration ensures that if the primary subscriber fails, the devices re-home to the secondary subscriber and not to the publisher for the cluster. In smaller systems and labs, Cisco permits a single publisher and single subscriber. But if the subscriber fails, then all the devices are active on the publisher. For specific details about the number of required Unified CM servers, see the chapter on Sizing Cisco Unified Communications Manager Servers.

    Figure 13. Unified CCE High-availability with One Unified CM Cluster

    To simplify the illustration in Figure 58, the Unified CCE Server or Unified CCE Central Controller is represented as a single server, but it is actually a set of servers sized according to the Unified CCE agent count and call volume. The Unified CCE Central Controllers include the following redundant or duplex servers:
    • Call Router — The core of the CCE complex that provides intelligent call routing instructions based on real-time conditions that it maintains in memory across both the A-Side And B-Side Call Router processes.
    • Logger and Database Server — The repository for all configuration and scripting information as well as historical data collected by the system. The Loggers are paired with Call Routers such that Call Router Side A will read and write data only to the Logger A and Call Router B will read and write only to the Logger B. Because both sides of the Call Router processes are synchronized, the data written to both Loggers is identical.

    In specific deployment models, these two components can be installed on the same physical server which is then referred to as a Rogger, or combined Router and Logger. See the chapter on Sizing Unified CCE Components and Servers for more details on these specific configurations.

    Unified Communications Manager JTAPI and Peripheral Gateway Failure Detection

    There is a heartbeat mechanism that is used to detect failures between the Unified Communications Manager JTAPI link and the Peripheral Gateway. However, unlike the Unified CCE heartbeat methods that use TCP keep-alive messages on the open socket ports, this method uses a specific heartbeat message in the JTAPI messaging protocol between the systems. By default, the heartbeat messages are sent every 30 seconds, and the communications path is reset by the Unified Communications Manager or Peripheral Gateway after missing two consecutive heartbeat messages.

    This failure detection can be enhanced by using the following procedure to change the heartbeat interval on the JTAPI Gateway client that runs on the Peripheral Gateway.

    Procedure
      Step 1   From the Start Menu of the Peripheral Gateway, Select Programs > Cisco JTAPI > JTAPI Preferences.
      Step 2   Set theAdvanced > Server Heartbeat Interval (sec) field to 5 seconds.
      Note   

      Do not set this value lower than five seconds because it might impact system performance and trigger an inappropriate fail-over. This setting determines how often the heartbeats are generated. If it is set to five seconds, the system will fail-over this connection within ten seconds of a loss of network connection (because it must detect two consecutive missed heartbeats). The default of 30 seconds means that it takes up to one minute (60 seconds) to take action on a network connection failure.

      Because this JTAPI connection between the Peripheral Gateway and Unified Communications Manager is only supported locally on the same LAN segment, there is no latency issue for this heartbeat value. However, if there are any additional network hops, firewalls, or other devices that cause delay between these two components, then set the heartbeat interval value accordingly to account for this delay.


      Unified CCE Redundancy Options

      Duplex or redundant Unified CCE servers can be located at the same physical site or they can be geographically distributed. This applies specifically to the Central Controller (Call Router and Logger) and Peripheral Gateways.

      Under normal operations, the Unified CCE Call Router and Logger and Database Server processes are interconnected through a Private Network connection that is isolated from the Visible or Public Network segment. Configure these servers with a second NIC card for the Private Network connection and isolate the Private connections from the rest of the Visible or Public Network in their own Cisco Catalyst switch if they are located at the same physical site.

      If the Central Controllers are geographically separated (located at two different physical sites), under normal operations the same Private Network connections must continue to be isolated and connected between the two physical sites with a separate WAN connection. For normal operations, do not provision this Private Network connection on the same circuits or network gear as the Visible or Public Network WAN connection because that creates a single point of failure that could disable both WAN segments at the same time.

      The Unified CCE Peripheral Gateway duplex pair of servers is also interconnected through a Private Network connection that is isolated from the Visible or Public Network segment under normal operations. If the two sides of the duplex pair (Side A and Side B) are both at the same physical site, the Private Network can be created by using an Ethernet Cross-Over Cable between the two servers to interconnect their Private Network NIC cards. If the two servers in the duplex pair are geographically distributed (located at two different physical sites), the Private Network connections must be connected with a separate WAN connection between the two physical sites. Do not provision this Private Network connection on the same circuits or network gear as the Visible or Public Network WAN connection because that creates a single point of failure that could disable both WAN segments at the same time.

      For additional details on the Unified ICM network requirements for this connection, see the installation guides.

      For additional details on the Unified CCE network requirements for clustered over the WAN, see the section on IPT: Clustering Over the WAN.

      Within the Agent PG, two software processes manage the connectivity to the Unified CM cluster:
      • JTAPI Gateway The JTAPI Gateway is installed on the PG by downloading it from the Unified CM cluster at the time of the PG installation. This ensures compatibility with the JTAPI and CTI Manager versions in the system. Note that when either the PG or Unified CM is upgraded, this JTAPI Gateway component must be removed and re-installed on the PG. The JTAPI Gateway is started by the PG automatically and runs as a node-managed process. The PG monitors this process and automatically restarts it if it fails for any reason. The JTAPI Gateway handles the low-level JTAPI socket connection protocol and messaging between the PIM and the Unified CM CTI Manager.
      • Agent PG Peripheral Interface Manager (PIM) The PIM is also a node-managed process and is monitored for unexpected failures and automatically restarted. This process manages the higher-level interface between the Unified CCE and the JTAPI Gateway and Unified CM cluster, requesting specific objects to monitor and handling route requests from the Unified CM cluster.

      In a duplex Agent PG environment, both JTAPI services from both Agent PG sides log into the CTI Manager upon initialization. Unified CM PG Side A logs into the primary CTI Manager; PG Side B logs into the secondary CTI Manager. Only the active side of the Unified CM PG register monitors for phones and CTI route points. The duplex Agent PG pair works in hot-standby mode with only the active PG side PIM communicating with the Unified CM cluster. The standby side logs into the secondary CTI Manager only to initialize the interface and make it available for a fail-over. The registration and initialization services of the Unified CM devices take a significant amount of time; therefore having the CTI Manager available significantly decreases the time for fail-over.

      In duplex PG operation, the side that goes active is the PG side that is first able to connect to the Unified CCE Call Router Server And request configuration information. It is not determined based on the side-A or side-B designation of the PG device but depends only on the ability of the PG to connect to the Call Router. The Call Router ensures that only the PG side that has the best connection goes active.

      The startup process of the PIM requires that all of the CTI route points be registered first, which is done at a rate of 5 route points per second. For systems with a lot of CTI route points (for example, 1000), this process can take as long as 3 minutes to complete before the system will allow any of the agents to log in. This time can be reduced by distributing the devices over multiple PIM interfaces to the Unified CM cluster, as noted above.

      In the event that calls arrive at the CTI Route Points in Unified CM but the PIM is not yet fully operational, these calls fail unless these route points are configured with a recovery number in their "Call Forward on Unregistered" or "Call Forward on Failure" setting. These recovery numbers can be the Cisco Unity voicemail system for the Auto Attendant (or perhaps the company operator position) to ensure that the incoming calls are being answered.

      Unified CM Failure Scenarios

      A fully redundant Unified CCE system contains no single points of failure. However, there are scenarios where a combination of multiple failures can reduce Unified CCE system functionality and availability. Also, if a component of the Unified CCE solution does not itself support redundancy and fail-over, existing calls on that component are dropped. The following failure scenarios have the most impact on high-availability and Unified CM Peripheral Interface Managers (PIMs) cannot activate if either of the following failure scenarios occurs:
      • Agent PG/PIM Side A and the secondary CTI Manager that services the PG/PIM on Side B both fail.
      • Agent PG/PIM Side B and the primary CTI Manager that services the PG/PIM on Side A both fail.

      In either of these cases, Unified CCE will not be able to communicate with the Unified CM cluster.

      Figure 14. Unified CM PGs Cannot Cross-Connect to Backup CTI Managers

      Unified CCE Fail-over Scenarios

      This section describes how redundancy works in the following failure scenarios:
      • Scenario 1: Unified CM and CTI Manager Fail
      • Scenario 2: Agent PG Side A Fails
      • Scenario 3: The Unified CM Active Call Processing Subscriber Fails
      • Scenario 4: The Unified CM CTI Manager Providing JTAPI Services to the Unified CCE PG Fails

      Scenario 1: Unified CM and CTI Manager Fail

      The following figure shows a complete system failure or loss of network connectivity on Cisco Unified CM subscriber A. The CTI Manager and Cisco Unified CM services were initially both active on this same Server And Unified CM subscriber A is the primary CTI Manager in this case.
      Figure 15. Scenario 1—Unified CM and CTI Manager Fail

      The following conditions apply to this scenario:

      • All phones and gateways are registered with Unified CM subscriber A as the primary server.
      • All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup Server As part of the redundancy group in Unified CM).
      • Unified CM subscribers A and B are each running a separate instance of CTI Manager within the same Unified CM cluster.
      • When Unified CM subscriber A fails, all registered phones and gateways re-home to Unified CM subscriber B. Calls that are in progress with agent phones remain active, but the agents are not able to use phone services such as conference or transfer until they hang up the call and their phone re-registers with the backup subscriber. Although the call stays active, Unified CCE loses visibility to the call and writes a Termination Call Detail (TCD) record to the Unified CCE database for the call at the time of the failure. No additional call data (such as wrap-up codes) are written about the call after that point. Phones that are not active on a call re-home automatically.
      • PG Side A detects a failure and induces a fail-over to PG Side B.
      • Depending on the configuration of the Peripheral in Unified CCE, the CTI OS or CAD server keeps the agent logged in but "grays out" their desktop controls until the PG has completed its fail-over processing. The agents might not have to log in again but might have to manually make themselves "ready" or "available" to ensure that they are aware that call processing functionality has been restored.
      • PG Side B becomes active and registers all dialed numbers and phones, call processing continues.
      • As noted above, when the PG fails-over, the Unified CCE Call Router writes a Termination Call Detail Record (TCD) in the Unified CCE database for any active calls. If the call is still active when the PG fails-over to the other side, a second TCD record is written for this call as if it were a "new" call in the system and not connected to the prior call that was recorded in the database.
      • When Unified CM subscriber A recovers, all idle phones and gateways re-home to it. Active devices wait until they are idle before re-homing to the primary subscriber.
      • PG Side B remains active using the CTI Manager on Unified CM subscriber B.
      • After recovery from the failure, the PG does not fail back to the A side of the duplex pair. All CTI messaging is handled using the CTI Manager on Unified CM subscriber B which communicates with Unified CM subscriber A to obtain phone state and call information.

      Scenario 2: Agent PG Side A Fails

      The figure below shows a failure on PG Side A and a fail-over to PG Side B. All CTI Manager and Unified CM services continue running normally.
      Figure 16. Scenario 2—Agent PG Side A Fails

      The following conditions apply to this scenario:
      • All phones and gateways are registered with Unified CM subscriber A.
      • All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server); however, they do not need to re-home as the primary subscriber continues to be functional.
      • Unified CM subscribers A and B are each running a local instance of CTI Manager.
      • When PG Side A fails, PG Side B becomes active.
      • PG Side B registers all dialed numbers and phones and call processing continues. Phones and gateways stay registered and operational with Unified CM subscriber A; they do not fail-over.
      • Agents with calls in progress will stay in progress but with no third-party call control (conference, transfer, and so forth) available from their agent desktop soft phones. Agents that were not on calls may notice their CTI desktop disable their agent state or third-party call control buttons on the desktop during the fail-over to the B-Side PG. Once the fail-over is complete, the agent desktop buttons are restored; however the barge in and conference calls will not be rebuilt properly and calls will disappear from the desktop when either of the participants drops out of the call. Call Type indication of Transfer, Barge In, Intercept, Supervisor Assist, and Emergency Assist are not recovered in the agent desktop or in reporting.
      • In most cases, after a PG fail-over, agents whose states are Available or Wrap Up are moved to Available. Alternatively, agents may receive a prompt to log in or to change their state from Not Ready to Available.
      • In most cases, after a PG fail-over, agents and supervisors whose states were either Talking or Reserved before the fail-over will be returned to Talking or Reserved, respectively (if the call is still active). Once the call ends, the agents' or supervisors' states change to Available.
      • When the PG fails-over, the Unified CCE Call Router writes a Termination Call Detail Record (TCD) in the Unified CCE database for any active calls. If the call is still active when the PG fails-over to the other side, a second TCD record is written for this call as if it were a "new" call in the system and not connected to the prior call that was recorded in the database.
      • When PG Side A recovers, PG Side B remains active and uses the CTI Manager on Unified CM subscriber B. The PG does not fail-back to Side A, and call processing continues on the PG Side B.

      Scenario 3: The Unified CM Active Call Processing Subscriber Fails

      The figure below shows a failure on Unified CM active call processing subscriber A. In this model, the subscriber is actively processing calls and controlling devices but does not provide the CTI Manager connection to the Unified CCE PG. The CTI Manager services are running on all the Unified CM subscribers in the cluster, but only subscribers C and D are configured to communicate with the Unified CCE Peripheral Gateway.

      Figure 17. Scenario—Only the Primary Unified CM Subscriber Fails

      The following conditions apply to this scenario:
      • All phones and gateways are registered with Unified CM subscriber A.
      • All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server).
      • Unified CM subscribers C and D are each running a local instance of CTI Manager to provide JTAPI services for the Unified CCE PGs.
      • If Unified CM subscriber A fails, phones and gateways re-home to the backup Unified CM subscriber B.
      • PG Side A remains connected and active with a CTI Manager connection on Unified CM subscriber C. It does not fail-over because the JTAPI-to-CTI Manager connection has not failed. However, it sees the phones and devices being unregistered from Unified CM subscriber A (where they were registered) and is notified of these devices being re-registered on Unified CM subscriber B automatically. During the time that the agent phones are not registered, the PG disables the agent CTI desktops to prevent the agents from attempting to use the system while their phones are not actively registered with a Unified CM subscriber. Also, they are logged out by the system during this transition to avoid routing calls to them.
      • Call processing continues for any devices not registered to Unified CM subscriber A. Call processing also continues for those devices on subscriber A when they are re-registered with their backup subscriber.
      • Calls in progress on phones registered to Unified CM subscriber A continue; however, the agent desktop is disabled to prevent any conference, transfer, or other third-party call control during the fail-over. After the agent disconnects the active call, that agent’s phone re-registers with the backup subscriber. The agent is logged out and will need to log in again.
      • As noted above, when the Unified CM subscriber A fails, the calls in progress stay active; however, Unified CCE loses control and track of those calls because the phone has not re-homed (re-registered) with the backup subscriber in the cluster. In fact, the phone does not re-home until after the current call is completed. The Unified CCE Call Router writes a Termination Call Detail Record (TCD) in the Unified CCE database for calls that were active at the time of the subscriber failure with call statistics up to the time of the failure and loss of control. Any additional call information (statistics, call wrap-up data, and so forth) are not written to the Unified CCE database.
      • When Unified CM subscriber A recovers, phones and gateways re-home to it. This re-homing can be set up on Unified CM to gracefully return groups of phones and devices over time or to require manual intervention during a maintenance window to minimize the impact to the call center. During this re-homing process, the CTI Manager service notifies the Unified CCE Peripheral Gateway of the phones being unregistered from the backup Unified CM subscriber B and re-registered with the original Unified CM subscriber A.
      • Call processing continues normally after the phones and devices have returned to their original subscriber.

      Scenario 4: The Unified CM CTI Manager Providing JTAPI Services to the Unified CCE PG Fails

      The figure below shows a CTI Manager service failure on Unified CM subscriber C that is used to communicate with the Unified CCE PG. The CTI Manager services are running on all the Unified CM subscribers in the cluster, but only subscribers C and D are configured to connect to the Unified CCE PGs. During this failure, the PG detects the loss of the JTAPI connection and fails-over to the redundant PG side.
      Figure 18. Scenario 4—Only the Unified CM CTI Manager Service Fails

      The following conditions apply to this scenario:
      • All phones and gateways are registered with Unified CM subscriber A.
      • All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server). In this case they do not re-home because subscriber A is still functional.
      • Unified CM subscribers C and D are each running a local instance of CTI Manager and are designed to connect to the Unified CCE PGs.
      • If the Unified CM CTI Manager service on subscriber C fails, the PG Side A detects a failure of the CTI Manager service and induces a fail-over to PG Side B.
      • PG Side B registers all dialed numbers and phones with the Unified CM CTI Manager service on subscriber D and call processing continues.
      • Agents with calls in progress stay in progress but with no third-party call control (conference, transfer, and so forth) available from their agent desktop soft phones. After an agent disconnects from all calls, that agent’s desktop functionality is restored. Although the call stays active, Unified CCE loses visibility to the call and writes a Termination Call Detail (TCD) record to the Unified CCE database for the call at the time of the failure. No additional call data such as wrap-up codes are written about the call after that point.
      • When the Unified CM CTI Manager service on subscriber C recovers, PG Side B continues to be active and uses the CTI Manager service on Unified CM subscriber D. The PG does not fail-back in this model.

      Unified CCE Scenarios for Clustering over the WAN

      Unified CCE can also be overlaid with the Unified CM design model for clustering over the WAN, which allows for high-availability of Unified CM resources across multiple locations and data centers. There are a number of specific design requirements for Unified CM to support this deployment model; Unified CCE adds its own specific requirements and new fail-over considerations to the model.

      Specific testing has been performed to identify the design requirements and fail-over scenarios. The success of this design model relies on specific network configuration and setup and the network must be monitored and maintained. The component failure scenarios noted previously (see Unified CCE Fail-over Scenarios) are still valid in this model. Additional failure scenarios for this model include:
      • Scenario 1: Unified CM and CTI Manager Fail
      • Scenario 2: Agent PG Side A Fails
      • Scenario 3: The Unified CM Active Call Processing Subscriber Fails
      • Scenario 4: The Unified CM CTI Manager Providing JTAPI Services to the Unified CCE PG Fails

      Note


      The terms public network and visible network are used interchangeably throughout this document.


      Scenario 1: Unified CCE Central Controller or Peripheral Gateway Private Network Failure

      In clustering over the WAN with Unified CCE, provide a separate private network connection between the geographically distributed Central Controller (Call Router and Logger) and the split Peripheral Gateway pair to maintain state and synchronization between the sides of the system.

      To understand this scenario fully, a brief review of the Unified CCE Fault Tolerant architecture is warranted. On each call router, there is a process known as the Message Delivery Service (MDS) which delivers messages to and from local processes such as router.exe and which handles synchronization of messages to both call routers. For example, if a route request comes from the carrier or any routing client to Side A, MDS ensures that both call routers receive the request. MDS also handles the duplicate output messages.

      The MDS process ensures that duplex Unified CCE sides are functioning in a synchronized execution, fault tolerance method. Both routers are executing everything in lockstep based on input the router receives from MDS. Because of this synchronized execution method, the MDS processes must always be in communication with each other over the private network. They use TCP keep-alive messages generated every 100 ms to ensure the health of the redundant mate or the other side. Missing five consecutive TCP keep-alive messages indicates to Unified CCE that the link or the remote partner system might have failed.

      When running duplexed Unified CCE sides for all production system, one MDS is the enabled synchronizer and is in a paired-enabled state. Its partner is the disabled synchronizer and is said to be paired-disabled. Whenever the sides are running synchronized, the Side A MDS is the enabled synchronizer in paired-enabled state. Its partner, Side B, is the disabled synchronizer in paired-disabled state. The enabled synchronizer sets the ordering of input messages to the router and also maintains the master clock for the Unified CCE system.

      If the private network fails between the Unified CCE Central Controllers, the following conditions apply:
      • The Call Routers detects the failure by missing five consecutive TCP keep-alive messages. The currently enabled side (Side A in most cases) transitions to an isolated-enabled state and continues to function as long as it is in communication with at least half of the PGs configured in the system.
      • The paired-disabled side (Side B in most cases) transitions to an isolated-disabled state. This side will then check for device majority. If it is not communicating with either an Active or Idle DMP to more than half of the configured PGs in the system, it stops processing and stays disabled.
      • If the B-Side has device majority (an Active or Idle connection to more than half the configured PGs), it transitions to a "Testing" state and sends "Test Other Side" (TOS) messages to each PG. This message is used to ask the PG if it can see the Call Router on the other side (in this case, Router A).
      • As soon as any (even one) PG responds to the TOS message that the A-Side is still enabled, Router B remains in the Isolated-Disabled state and goes idle. Logger B also goes idle, as well as all the DMP connections to the PGs for Router B. All call processing continues on Side A without impact.
      • If all of the PGs reply that Side A is down or not reachable, the B-Side Call Router re-initializes in simplex mode (isolated-enabled) and takes over all routing for Unified CCE.
      • There is no impact to the agents, calls in progress, or calls in queue. The system can continue to function normally; however the Call Routers are in simplex mode until the private network link is restored.

      Additional Considerations

      The Call Routers are "paired" with the Loggers and can read and write only to their own Logger for configuration and historical data over the Private Network locally. In the event that the failure is caused by the loss of a Private NIC card in the Call Router and that Call Router is the enabled side, it cannot write any historical data to the Logger nor can any configuration changes be made to the Logger database.

      The Private NIC in the Call Router is also used in some cases to communicate with carrier-based Pre-Routing Network or SS7 interfaces. If the Private NIC fails, there is no access to these services.

      If there is an even number of PGs specified in the Call Router Setup and only half of the PGs are available, then only Side A runs. For the B-Side to be operational during a private network failure, it must be able to communicate with more than half of the PGs in the system.

      It is important to maintain the configuration so that "extra" PGs or PGs that are no longer on the network are removed from the Call Router Setup panels to avoid problems with determination of device majority for PGs that no longer exist.

      If the private network fails between the Unified CM Peripheral Gateways, the following conditions apply:
      • The Peripheral Gateway sides detect a failure if they miss five consecutive TCP keep-alive messages and they follow a process similar to the Call Routers of leveraging the MDS process when handling a private link failure. As with the Central Controllers, one MDS process is the enabled synchronizer and its redundant side is the disabled synchronizer. When running redundant PGs, the A side is always the enabled synchronizer.
      • After detecting the failure, the disabled synchronizer (Side B) initiates a test of its peer synchronizer by using the TOS procedure on the Public or Visible Network connection. If PG Side B receives a TOS response stating that the A side synchronizer is enabled or active, then the B side immediately goes out of service, leaving the A side to run in simplex mode until the Private Network connection is restored. The PIM, OPC, and CTI SVR processes become active on PG Side A, if not already in that state, and the CTI OS Server process still remains active on both sides as long as the PG Side B server is healthy. If the B side does not receive a message stating that the A side is enabled, then Side B continues to run in simplex mode and the PIM, OPC, and CTI SVR processes become active on PG Side B if not already in that state. This condition occurs only if the PG Side A server is truly down or unreachable due to a double failure of visible and private network paths.
      • There is no impact to the agents, calls in progress, or calls in queue because the agents stay connected to their already established CTI OS Server process connection. The system can continue to function normally; however, the PGs are in simplex mode until the private network link is restored.

      If the two private network connections are combined into one link, the failures follow the same path; however, the system runs in simplex mode on both the Call Router, and the Peripheral Gateway. If a second failure were to occur at that point, the system could lose some or all of the call routing and ACD functionality.

      Scenario 2: Visible Network Failure

      The visible network in this design model is the network path between the data center locations where the main system components (Unified CM subscribers, Peripheral Gateways, Unified IP IVR/Unified CVP components, and so forth) are located. This network is used to carry all the voice traffic (RTP stream and call control signaling), Unified CCE CTI (call control signaling) traffic, as well as all typical data network traffic between the sites. To meet the requirements of Unified CM clustering over the WAN, this link must be highly available with very low latency and sufficient bandwidth. This link is critical to the Unified CCE design because it is part of the fault-tolerant design of the system. It must also be highly resilient.

      • The high-availability (HA) WAN between the central sites must be fully redundant with no single point of failure. (For information regarding site-to-site redundancy options, see the WAN infrastructure and QoS design guides.) In case of partial failure of the high-availability WAN, the redundant link must be capable of handling the full central-site load with all QoS parameters. For more information, see the section on Bandwidth Requirements for Unified CCE Clustering Over the WAN.
      • An HA WAN using point-to-point technology is best implemented across two separate carriers, but this is not necessary when using a ring technology.
      If the visible network fails between the data center locations, the following conditions apply:
      • The Unified CM subscribers detect the failure and continue to function locally with no impact to local call processing and call control. However, any calls that were set up over this WAN link fail with the link.
      • The Unified CCE Call Routers detect the failure because the normal flow of TCP keep-alives from the remote Peripheral Gateways stops. Likewise, the Peripheral Gateways detect this failure by the loss of TCP keep-alives from the remote Call Routers. The Peripheral Gateways automatically realign their data communications to the local Call Router and the local Call Router then uses the private network to pass data to the Call Router on the other side to continue call processing. This does not cause a fail-over of the Peripheral Gateway or the Call Router.
      • Half the agents (or more) might be affected by this failure under the following circumstances:
        • If the agent desktop (Cisco Agent Desktop or CTI OS) is registered to the Peripheral Gateway on Side A of the system but the physical phone is registered to Side B of the Unified CM cluster. Under normal circumstances, phone events are passed from Side B to Side A over the visible network by using the CTI Manager Service to present these events to the Side A Peripheral Gateway. The visible network failure does not force the IP phone to re-home to Side A of the cluster and the phone remains operational on the isolated Side B. The Peripheral Gateway is no longer able to see this phone and the agent is logged out of Unified CCE automatically because the system can no longer direct calls to the agent’s phone.
        • If the agent desktop (Cisco Agent Desktop or CTI OS) and IP phone are both registered to Side A of the Peripheral Gateway and Unified CM, but the phone is reset and it re-registers to a Side B of the Unified CM subscriber. If the IP phone re-homes or is manually reset and forced to register to Side B of a Unified CM subscriber, the Unified CM subscriber on Side A that is providing the CTI Manager service to the local Peripheral Gateway unregisters the phone and removes it from service. Because the visible network is down, the remote Unified CM subscriber at Side B cannot send the phone registration event to the remote Peripheral Gateway. Unified CCE logs this agent out because it can no longer control the phone for the agent.
        • If the agent desktop (CTI Toolkit Agent Desktop or Cisco Agent Desktop) is registered to the CTI OS Server At the side-B site but the active Peripheral Gateway side is at the side-A site. Under normal operation, the CTI Toolkit Agent Desktop load-balances its connections to the CTI OS Server pair. At any given time, half the agent connections are on a CTI OS server that has to cross the visible network to connect to the active Peripheral Gateway CTI Server (CG). When the visible network fails, the CTI OS Server detects the loss of connection with the remote Peripheral Gateway CTI Server (CG) and disconnects the active agent desktop clients to force them to re-home to the redundant CTI OS Server At the remote site. The CTI Toolkit Agent Desktop is aware of the redundant CTI OS Server And automatically uses this server. During this transition, the CTI Toolkit Agent Desktop is disabled and returns to an operational state as soon as it is connected to the redundant CTI OS server. (The agent may be logged out or put into not-ready state depending on the /LOAD parameter defined for the Unified CM Peripheral Gateway in Unified CCE Configuration Manager.)

      Scenario 3: Visible and Private Networks Both Fail (Dual Failure)

      Individually, the private and visible networks can fail with limited impact to the Unified CCE agents and calls. However, if both of these networks fail at the same time, the system is reduced to very limited functionality. This failure is considered catastrophic and can be avoided by careful WAN design with backup and resiliency built into the design.

      If both the visible and private networks fail at the same time, the following conditions apply:
      • The Unified CM subscribers detect the failure and continue to function locally with no impact to local call processing and call control. However, any calls that were set up and are sending the active voice path media over the visible WAN link fail with the link. When the call fails, the Unified CCE PG sees the call drop and writes a Termination Call Detail (TCD) record in the Unified CCE database for that call at the time it is dropped.
      • The Call Routers and Peripheral Gateways detect the private network failure after missing five consecutive TCP keep-alive messages. These TCP keep-alive messages are generated every 100 ms, and the failure is detected within about 500 ms on this link.
      • The Call Routers attempt to contact their Peripheral Gateways with the test-other-side message to determine if the failure was a network issue or if the remote Call Router had failed and was no longer able to send TCP keep-alive messages. The Call Routers determine which side continues to be active (typically, this is the A-Side of the system because it is the side with the most active Peripheral Gateway connections), and that side stays active in simplex mode while the remote Call Router and PGs are in isolated-disabled mode. The Call Routers send a message to the Peripheral Gateways to realign their data feeds to the active Call Router only.
      • The Peripheral Gateways determine which side has the active Unified CM connection. However, it also considers the state of the Call Router, and the Peripheral Gateway does not remain active if it is not able to connect to an active Call Router. Typically, this will force the A-Side PGs into active simplex enabled mode and the B-Side into isolated-disabled mode.
      • The surviving Call Router and Peripheral Gateways detect the failure of the visible network by the loss of TCP keep-alives on the visible network. These keep-alives are sent every 400 ms so it can take up to two seconds before this failure is detected.
      • The Call Router only sees the local Peripheral Gateways, which are those used to control local Unified IP IVRs or Unified CVP Call Servers and the local half of the Unified CM cluster. The remote Unified IP IVRs or Unified CVP Call Servers are off-line with no Unified CCE Call Control via the GED-125 IVR PG interface. The Unified CCE Call Routing Scripts automatically routes around these off-line devices using the peripheral-on-line status checks. Calls that were in progress in the off-line IP-IVRs either drop or use the local default script in the IP-IVR or the Call Forward on Error settings in Unified CM. Calls under Unified CVP control from the off-line Call Servers get treatment from the survivability TCL script in their ingress Voice Gateways. For calls that were in progress but are no longer visible to Unified CCE, a Termination Call Detail (TCD) record is written to the Unified CCE database for the call data up to the time of the failure. If the default or survivability scripts redirect the calls to another active Unified CCE component, the call appears as a "new call" to the system with no relationship to the original call for reporting or tracking purposes.
      • Any new calls that come into the disabled Side Are not routed by Unified CCE but they can be redirected or handled using standard Unified CM redirect on failure for their CTI route points or the Unified CVP survivability TCL script in the ingress Voice Gateways.
      • Agents are impacted as noted above if their IP phones are registered to the side of the Unified CM cluster opposite the location of their active Peripheral Gateway and CTI OS Server connection. Only agents that were active on the surviving side of the Peripheral Gateway with phones registered locally to that site are not impacted.

      At this point, the Call Router and Unified CM Peripheral Gateway run in simplex mode and the system accepts new calls from only the surviving side for Unified CCE call treatment. The Unified IP IVR/Unified CVP functionality is also limited to the surviving side.

      Scenario 4: Unified CCE Agent Site WAN (Visible Network) Failure

      The Unified CCE design model for clustering over the WAN assumes the Unified CCE agents are remotely located at multiple sites connected by the visible WAN. Each agent location requires WAN connectivity to both of the data center locations across the visible WAN where the Unified CM and Unified CCE components are located. These connections provide for redundancy as well as making use of basic SRST functionality in the event of a complete network failure, so that the remote site still has basic dial tone service to make emergency (911) calls.

      If Side A of the WAN at the Unified CCE Agent Site fails, the following conditions apply:
      • Any IP phones that are homed to the side-A Unified CM subscribers automatically re-home to the side-B subscribers (provided the redundancy group is configured).
      • Agent desktops that are connected to the CTI OS or Cisco Agent Desktop Server At that site automatically realign to the redundant CTI OS Server At the remote site. (Agent desktops are disabled during the realignment process.)
      If both sides of the WAN at the Unified CCE Agent Site fail, the following conditions apply:
      • The local Voice Gateway detects the failure of the communications path to the Unified CM cluster and goes into SRST mode to provide local dial-tone functionality. With Unified CVP, these gateways detect the loss of the Unified CVP Call Server And execute their local survivability TCL script to reroute the inbound calls. Active calls in Unified CVP locally are no longer be visible to Unified CCE, so a Termination Call Detail (TCD) record is written to the Unified CCE database at the time of the failure and tracking of the call stops at that point. The call executes the local survivability TCL script, which could redirect it using the PSTN to another Unified CCE site that remains active; however, the call then appears as a "new call" to Unified CCE and has no relationship with the original call information. If the call is retained locally and redirected by way of SRST to a local phone, Unified CCE does not have visibility to the call from that point forward.
      • The agent desktop detects the loss of connectivity to the CTI OS Server (or Cisco Agent Desktop Server) and automatically logs the agent out of the system. While the IP phones are in SRST mode, they are not able to function as Unified CCE agents.

      Understanding Failure Recovery

      This section analyzes the fail-over recovery of each individual part (products and subcomponents inside each product) of the Unified CCE solution.

      Unified CM Service

      In larger deployments, it is possible that the Unified CM to which the agent phones are registered is not running the CTI Manager service that communicates with the Unified CM Peripheral Gateway for Unified CCE. When an active Unified CM (call processing) service fails, all the devices registered to it are reported "out of service" by the CTI Manager service locally and to any external client such as the Peripheral Gateway on a different subscriber CTI Manager service.

      Unified CM call detail reporting (CDR) shows the call as terminated when the Unified CM failure occurred, although the call may have continued for several minutes after the failure because calls in progress stay in progress. IP phones of agents not on calls at the time of failure quickly register with the backup Unified CM subscriber. The IP phone of an agent on a call at the time of failure does not register with the backup Unified CM subscriber until after the agent completes the current call. If MGCP, H.323, or SIP gateways are used, then the calls in progress survive, but further call control functions (hold, retrieve, transfer, conference, and so on) are not possible.

      Unified CCE also writes a call record to the Termination Call Detail (TCD) table because Unified CM has reported the call as terminated to the Unified CCE PG. If the call continues after the PG has failed-over, a second TCD record is written as a "new call" not related to the original call.

      When the active Unified CM subscriber fails, the PG receives out-of-service events from Unified CM and logs out the agents. To continue receiving calls, the agents must wait for their phones to re-register with a backup Unified CM subscriber, then log back into their Unified CCE desktop application to have its functionality restored. On recovery of the primary Unified CM subscriber, the agent phones re-register to their original subscriber to return the cluster to the normal state with phones and devices properly balanced across multiple active subscribers.

      In summary, the Unified CM call processing service is separate from the CTI Manager service which connects to the Unified CM PG through JTAPI. The Unified CM call processing service is responsible for registering the IP phones and its failure does not affect the Unified CM PGs. From a Cisco Unified CCE perspective, the PG does not go off-line because the Unified CM server running CTI Manager remains operational. Therefore, the PG does not need to fail-over.

      Unified IP IVR

      When a CTI Manager service fails, the Unified IP IVR JTAPI subsystem shuts down and restarts by trying to connect to the secondary CTI Manager service on a backup Unified CM subscriber in the cluster. In addition, all voice calls at this Unified IP IVR are dropped. If there is an available secondary CTI Manager service on a backup subscriber, the Unified IP IVR logs into this CTI Manager service on that subscriber and re-registers all the CTI ports associated with the Unified IP IVR JTAPI user. After all the Unified CM devices are successfully registered with the Unified IP IVR JTAPI user, the server resumes its Voice Response Unit (VRU) functions and handles new calls. This action does not impact the Unified CVP because it does not depend on the Unified CM CTI Manager service for call control.

      Unified IP IVR Release 3.5 provided for cold standby and Release 4.0 provides hot standby redundancy but this configuration is not supported for use with Unified CCE. These designs make use of a redundant server that is not used unless there is a failure of the primary Unified IP IVR server. However, during this fail-over processing, all calls that are in queue or treatment are dropped on the Unified IP IVR as part of the fail-over. A more resilient design is to deploy a second (or more) Unified IP IVR servers and have them all active, allowing Unified CCE to load-balance calls across them automatically. As shown in Figure 1, if one of the Unified IP IVR servers fails, only the calls on that server fail but the other active servers remain active and are able to accept new calls in the system.

      Unified CCE

      Unified CCE is a collection of services and processes running on Unified CCE servers. The fail-over and recovery process for each of these services is unique and requires careful examination to understand the impact to other parts of the Unified CCE solution (including another Unified CCE service).

      Unified CM PG and CTI Manager Service

      When the active CTI Manager service or PG software fails, the PG JTAPI Gateway or PIM detects an OUT_OF_SERVICE event and induces a fail-over to the redundant (duplex) PG. Because the redundant PG is logged into the backup Unified CM subscriber CTI Manager service already, it registers the IP phones and configured dialed numbers or CTI route points automatically. This initialization service takes place at a rate of about 5 devices per second. The agent desktops show them as being logged out or not ready, and a message displays stating that their routing client or peripheral (Unified CM) has gone off-line. (This warning can be turned on or off depending on the administrator's preference.) All agents and supervisors lose their desktop third-party call control functionality until the failure recovery is complete. The agents and supervisors can recognize this event because call control action buttons on the desktop gray out and they cannot do anything with the desktop. Any existing calls remain active without any impact to the caller.

      In the event that calls arrive at the CTI Route Points in Unified CM during a PG fail-over and the PIM is not yet fully operational, these calls fail unless these route points are configured with a recovery number in their "Call Forward on Unregistered" or "Call Forward on Failure" setting. These recovery numbers can be the Cisco Unity voicemail system for the Auto Attendant (or perhaps the company operator position) to ensure the incoming calls are getting answered.


      Note


      Do not push any buttons during desktop fail-over because these keystrokes can be buffered and sent to the CTI server when it completes its fail-over and restores the agent states.


      When an active PG fails over to the idle side, calls still in progress are recovered by querying Unified CM as part of the activation sequence. There is one Termination Call Detail record providing information about the call after the PG transition when the call terminates. Peripheral call variables and ECC variables are maintained on the agent desktop. Call Type indication of Transfer, Barge In, Intercept, Supervisor Assist, and Emergency Assist are not recovered in the desktop or in reporting after the fail-over. In most cases, after a PG fail-over, agents whose states are Available or Wrap Up are moved to Available. Alternatively, agents may receive a prompt to log in or to change their state from Not Ready to Available. Agents can release, transfer, or conference calls from their agent desktop after activation completes. During conference tear down, a call appearance from the desktop of an active call but agent state is not affected. Calls that end while the PG is down end after a dead call time out after two hours.


      Note


      Call and agent state information might not be complete at the end of a fail-over if there are call status and agent state changes during the fail-over window.


      Unified CCE Voice Response Unit PG

      When a Voice Response Unit (VRU) PG fails, all the calls currently in queue or treatment on that Unified IP IVR are dropped unless there is a default script application defined or the CTI Ports have a recovery number defined in Unified CM for their "Call Forward on Failure" setting. Calls in progress or queued in Unified CVP are not dropped and are redirected to a secondary Unified CVP or number in the H.323 or SIP dial plan, if available by the Survivability TCL script in the Voice Gateway.

      The redundant (duplex) VRU PG side connects to the Unified IP IVR or Unified CVP and begins processing new calls upon fail-over. On recovery of the failed VRU PG side, the currently running VRU PG continues to operate as the active VRU PG. Therefore, having redundant VRU PGs adds significant value because it allows a Unified IP IVR or Unified CVP to continue to function as an active queue point or to provide call treatment. Without VRU PG redundancy, a VRU PG failure would block use of that IP IVR even though the IP IVR is working properly.

      Figure 19. Redundant Unified CCE VRU PGs with Two IP IVR Servers

      Unified CCE Call Router and Logger

      The Unified CCE Central Controllers or Unified CCE Servers are shown in these diagrams as a single set of redundant servers. However, depending on the size of the implementation, they can be deployed with multiple servers to host the following key software processes:
      • Unified CCE Call Router The Unified CCE Call Router is the brain of the system and it maintains a constant memory image of the state of all the agents, calls, and events in the system. It performs the call routing in the system; executing the user-created Unified CCE Routing Scripts and populating the real-time reporting feeds for the Administration & Data Server. The Call Router software runs in synchronized execution with both of the redundant servers running the same memory image of the current state across the system. They keep this information updated by passing the state events between the servers on the private LAN connection.
      • Unified CCE Logger and Database Server The Unified CCE Logger and Database Server maintain the system database for the configuration (agent IDs, skill groups, call types, and so forth) and scripting (call flow scripts) as well as the historical data from call processing. The Loggers receive data from their local Call Router process to store in the system database. Because the Call Routers are synchronized, the Logger data is also synchronized. In the event that the two Logger databases are out of synchronization, they can be resynchronized manually by using the Unified ICMDBA application over the private LAN. The Logger also provides a replication of its historical data to the customer Administration & Data Server over the visible network.
      In the event that one of the Unified CCE Call Routers fails, the surviving server detects the failure after missing five consecutive TCP keep-alive messages on the private LAN. The Call Routers generate these TCP keep-alive messages every 100 ms, so it takes up to 500 ms to detect this failure. On detection of the failure, the surviving Call Router contacts the Peripheral Gateways in the system to verify the type of failure that occurred. The loss of TCP keep-alive messages on the private network can be caused by either of the following conditions:
      • Private network outage — It is possible for the private LAN switch or WAN to be down but for both of the Unified CCE Call Routers to still be fully operational. In this case, the Peripheral Gateways still see both of the Unified CCE Call Routers even though they cannot see each other over the private network to provide synchronization data. If the disabled synchronizer (Call Router B) can communicate with a majority of the PGs, it then sends a Test Other Side (TOS) message to the PGs sequentially to determine if the Call Router on the other side (Side A) is enabled. If Call Router B receives a message that Side A is in fact enabled, then Call Router A runs in simplex until the private network is restored. If all the PGs reply to the TOS message and indicate that Side A is down, then Side B re-initializes in simplex mode.
      • Call Router hardware failure — It is possible for the Call Router on the other side to have a physical hardware failure and be completely out of service. In this case, the Peripheral Gateways report that they can no longer see the Call Router on the other Side And the surviving Call Router takes over the active processing role in simplex mode. This failure is detected by the Call Routers from the loss of heartbeat keep-alives on the Private Network.

      During Call Router fail-over processing, any Route Requests sent to the Call Router from a Carrier Network Interface Controller (NIC) or Peripheral Gateway are queued until the surviving Call Router is in active simplex mode. Any calls in progress in the IVR or at an agent are not impacted.

      If one of the Unified CCE Logger and Database Servers were to fail, there is no immediate impact except that the local Call Router is no longer able to store data from call processing. The redundant Logger continues to accept data from its local Call Router. When the Logger server is restored, the Logger contacts the redundant Logger to determine how long it had been off-line. If the Logger was off-line for less than 12 hours, it automatically requests all the transactions it missed from the redundant Logger while it was off-line. The Loggers maintain a recovery key that tracks the date and time of each entry recorded in the database and these keys are used to restore data to the failed Logger over the private network.

      If the Logger was off-line for more than 12 hours, the system does not automatically resynchronize the databases. In this case, resynchronization is done manually using the Unified ICMDBA application. Manual resynchronization allows the system administrator to decide when to perform this data transfer on the private network, perhaps scheduling it during a maintenance window when there is little call processing activity in the system.

      The Logger replication process that sends data from the Logger database to the HDS database on the Administration & Data Servers automatically replicates each new row written to the Logger database when the synchronization takes place as well.

      There is no impact to call processing during a Logger failure; however, the historical data on the Administration & Data Server that is replicated from that Logger stops until the Logger is restored.

      Additionally, if the Unified Outbound Option is used, the Campaign Manager software is loaded on Logger A only. If that platform is out of service, any outbound calling stops until the Logger is restored to operational status.

      Administration and Data Server

      The Administration & Data Server provides the user interface to the system for making configuration and scripting changes. It can also host the web-based reporting tool and Internet Script Editor.

      These servers do not support redundant or duplex operation as the other Unified CCE system components do. However, you can deploy multiple Administration & Data Servers to provide redundancy for Unified CCE.

      Figure 20. Redundant Unified CCE Administration & Data Servers

      Administration & Data Server Real-Time Distributors are clients of the Unified CCE Call Router real-time feed that provides real-time information about the entire Unified CCE across the enterprise. Real-Time Distributors at the same site can be set up as part of an Admin Site that includes a designated primary real-time distributor and one or more secondary real-time distributors. Another option is to add Administration Clients that do not have their own local SQL databases and are homed to a Real-Time Distributor locally for their SQL database and real-time feed.

      The Admin Site reduces the number of real-time feed clients the Unified CCE Call Router has to service at a particular site. For remote sites, this is important because it can reduce the required bandwidth to support remote Administration & Data Servers across a WAN connection.

      When using an Admin Site, the primary Administration & Data Server is the one that will register with the Unified CCE Call Router for the real-time feed and the other Administration & Data Servers within that Admin Site register with the primary Administration & Data Server for the real-time feed. If the primary real-time distributor is down, the secondary real-time distributors will register with the Unified CCE Call Router for the real-time feed. Administration Clients that cannot register with the primary or secondary Administration & Data Server will not be able to perform any tasks until the distributors are restored.

      Alternatively, each Administration & Data Server can be deployed in its own Admin Site regardless of the physical site of the device. This deployment creates more overhead for the Unified CCE Call Router to maintain multiple real-time feed clients; however, it prevents a failure of the primary Administration & Data Server from taking down the secondary Administration & Data Server At the site.

      Additionally, if the Administration & Data Server is used to host the ConAPI interface for the Cisco Unified Contact Center Management Portal (Unified CCMP), any configuration changes made to the Unified CCE or Unified CCMP systems are not passed over the ConAPI interface until it is restored.

      CTI Server

      The CTI Server monitors the data traffic of the Unified CM PIM on the Agent PG for specific CTI messages (such as call ringing or off-hook events) and makes those messages available to CTI clients such as the CTI OS Server or Cisco Agent Desktop Enterprise Server. It also processes third-party call control messages (such as make call or answer call) from the CTI clients and sends those messages by using the PIM interface of the PG to Unified CM to process the event on behalf of the agent desktop.

      CTI Server is redundant and co-resident on the Agent PG servers. (See Figure 66) It does not, however, maintain agent state in the event of a failure. On failure of the CTI Server, the redundant CTI Server Becomes active and begins processing call events. Both CTI OS and Finesse Servers are clients of the CTI Server And are designed to monitor both CTI Servers in a duplex environment and maintain the agent state during fail-over processing. CTI OS agents see their desktop buttons dim during the fail-over to prevent them from attempting to perform tasks while the CTI Server is down. The buttons are restored as soon as the redundant CTI Server is restored and the agent does not have to log on again to the desktop application. Most call context is maintained, but ANI and DNIS are lost in this instance where only the CTI Server component is impacted.

      Finesse servers return an OUT_OF_SERVICE status to clients during the fail-over, preventing the clients from initiating actions. The Finesse Desktop user interface retains its last state until the redundant CTI Server is restored, at which time the Finesse server updates each client with the current CTI state.

      Figure 21. Redundant CTI Servers Coresident on Agent PG

      CTI OS Considerations

      CTI OS Server is a software component that runs co-resident on the Unified CM Peripheral Gateway. CTI OS Server software is designed to be fault-tolerant and is typically deployed on redundant physical servers; however, unlike the PG processes that run in hot-standby mode, both of the CTI OS Server processes run in active mode all the time. The CTI OS Server processes are managed by Node Manager, which monitors each process running as part of the CTI OS service and which automatically restarts abnormally terminated processes.

      CTI OS handles fail-over of related components as described in the following scenarios (see figure below).

      Figure 22. Redundant CTI OS Server Processes

      Scenario 1: CTI Server Side A (Active) Fails

      In this scenario, CTI Server Side A is co-resident on PG Side A, and the following events occur:
      • CTI Server Side B detects the failure of Side A and becomes active.
      • Node Manager restarts CTI Server Side A and becomes idle.
      • Both CTI OS Server Sides A and B drop all CTI OS client and agent connections and restart after losing the connection to CTI Server A. At startup, CTI OS Server Sides A and B stay in CONNECTING state until they connect to CTI Server Side B, and then they go into CONFIGURING state where they download agent and call states and configuration information. CTI OS Client connections are not accepted by CTI OS Server A and B during CONNECTING and CONFIGURING states. When CTI OS Server synchronizes with CTI Server, the state becomes ACTIVE and it is now ready to accept CTI OS Client connections.
      • Both CTI OS Clients 1 and 2 lose connections to CTI OS Servers and they each randomly select one CTI OS Server to connect to. CTI OS Client 1 can be connected to either CTI OS Server A or B, and the same is true for CTI OS Client 2. During this transition, the buttons of the CTI Toolkit Agent Desktop is disabled and returns to operational state as soon as it is connected to a CTI OS.

      Note


      When the active agent peripheral’s CG loses network connectivity to the agent desktops, the system registers that agents are not connected within one minute and waits one more minute for possible re-connection before it fails over to the other side.


      Scenario 2: CTI Server B (Idle) Fails

      In this scenario, CTI Server Side B is co-resident on PG Side B but is not the active side. The following events occur:
      • CTI Server Side A stays active.
      • Node Manager restarts CTI Server Side B and stays idle.
      • Neither CTI OS Clients nor CTI OS Servers are affected by this failure.

      Scenario 3: CTI OS Server A Fails

      In this scenario, CTI OS Server Side A processes are co-resident on PG/CTI Server Side A. The following events occur:
      • CTI OS Client 1 detects the loss of network connection and automatically connects to CTI OS Server B. During this transition, the buttons of the CTI Toolkit Agent Desktop are disabled and return to the operational state as soon as it is connected to CTI OS Server B.
      • CTI OS Client 2 stays connected to CTI OS Server B.
      • NodeManager restarts CTI OS Server A.

      Scenario 4: CTI OS Server B Fails

      In this scenario, CTI OS Server Side A processes are co-resident on PG/CTI Server Side B. The following events occur:
      • CTI OS Client 2 detects the loss of network connection and automatically connects to CTI OS Server A. During this transition, the buttons of the CTI Toolkit Agent Desktop are disabled and return to the operational state as soon as it is connected to CTI OS Server A.
      • CTI OS Client 1 stays connected to CTI OS Server A.
      • NodeManager restarts CTI OS Server B.

      Scenario 5: CTI OS Client 1 Fails

      In this scenario, the following events occur:
      • The agent manually restarts CTI OS Client 1 application.
      • CTI OS Client 1 randomly selects one CTI OS Server to connect to. (CTI OS Client 1 can be connected to either CTI OS Server A or B.)
      • Once connected, the agent logs in and CTI OS Client 1 recovers its state by getting agent and call states through the CTI OS Server to which it is connected.

      Scenario 6: CTI OS Client 2 Fails

      In this scenario, the following events occur:
      • The agent manually restarts CTI OS Client 2 application.
      • CTI OS Client 2 randomly selects one CTI OS Server to connect to. (CTI OS Client 2 can be connected to either CTI OS Server A or B.)
      • Once connected, the agent logs in and CTI OS Client 2 recovers its state by getting agent and call states through the CTI OS Server to which it is connected.

      Scenario 7 - Network Failure between CTI OS Client 1 and CTI OS Server A

      In this scenario, the following events occur:
      • CTI OS Server A drops the connection of CTI OS Client 1
      • CTI OS Client 1 detects the loss of network connection and automatically connects to CTI OS Server B. During this transition, the buttons of the CTI Toolkit Agent Desktop are disabled and return to the operational state as soon as it is connected to CTI OS Server B.

      Scenario 8: Network Failure between CTI OS Client 1 and CTI OS Server B

      CTI OS Client 1 is not affected by this failure because it is connected to CTI OS Server A.

      Scenario 9: Network Failure between CTI OS Client 2 and CTI OS Server A

      CTI OS Client 2 is not affected by this failure because it is connected to CTI OS Server B.

      Scenario 10: Network Failure between CTI OS Client 2 and CTI OS Server B

      In this scenario, the following events occur:
      • CTI OS Server B drops the connection of CTI OS Client 2.
      • CTI OS Client 2 detects the loss of network connection and automatically connects to CTI OS Server A. During this transition, the buttons of the CTI Toolkit Agent Desktop are disabled and return to the operational state as soon as the desktop is connected to CTI OS Server A.

      Note


      Note: If there are no clients connected to the active CTI Server, a default switch occurs every "NoClientConnectionTimeout" seconds. This occurs to isolate any spurious reasons that the CTI clients cannot connect to the active CTI Server.


      Cisco Finesse Considerations

      Cisco Finesse is a software component that runs exclusively in a dedicated virtual machine, either stand-alone or coresident on the Unified CM Peripheral Gateway server (if virtualized). The Cisco Finesse software is designed to be fault-tolerant and is deployed on redundant physical servers. Both Finesse servers are active. Cisco Finesse runs on the Cisco VOS platform, where the local servm process restarts any failed Finesse processes.

      Finesse handles the fail-over of related components as described in the following scenarios (see figure below).

      Figure 23. Redundant Finesse Processes

      Scenario 1: CTI Server Side A (Active) Fails

      In this scenario, CTI Server Side A is coresident on PG Side A, and the following events occur:
      • CTI Server Side B detects the failure of Side A and becomes active.
      • Node Manager restarts CTI Server Side A and becomes idle.
      • Both Finesse Server Side A and Side B go into OUT_OF_SERVICE state and do not allow any further client actions or sign-in attempts. The Finesse servers attempt to reconnect to CTI Server A and, on a reconnect failure, attempt to connect to CTI Server B. After the connection to CTI Server B is achieved, both Finesse servers attempt to rebuild their internal agent state through CTI Server B. After the internal state is rebuilt, the Finesse servers transition to IN_SERVICE and allow new sign-in attempts, as well as any CTI actions. Each connected Finesse client also receives an XMPP update with the refreshed agent state as received from CTI Server B, including the updated call state as received from CTI Server B

      Scenario 2: CTI Server B (Idle) Fails

      In this scenario, CTI Server Side B is coresident on PG Side B but is not the active side. The following events occur:
      • CTI Server Side A stays active.
      • Node Manager restarts CTI Server Side B and stays idle.
      • Finesse clients and Finesse servers are not affected by this failure.

      Scenario 3: Finesse Server A Fails

      In this scenario, Finesse Server A fails. The following events occur:
      • The Finesse Desktop application detects the loss of network connection to the Finesse Server And automatically connects to Finesse Server B. Fail-over is handled through an HTML redirect to the sign-in page of the Finesse Desktop application on Finesse Server B. Agents are prompted to enter their sign-in credentials.
      • After the agents sign in to Finesse Server B, their desktops are updated to reflect their current state.
      • All Finesse clients remain on Finesse Server B until they sign out of their current session.
      • Third-party applications that use the Finesse REST API must perform the fail-over within their application logic to move to Finesse Server B. Finesse provides a REST API call that contains the host addresses of Finesse Servers A and B.
      • If the cause of the failure on Finesse Server A was the Cisco Tomcat process stopping for any reason, the local servm process attempts to restart Cisco Tomcat.

      Scenario 4: Finesse Server B Fails

      In this scenario, all Finesse clients fail-over to Finesse Server A with same set of server state transitions and client updates as in Scenario 3.

      Scenario 5: Finesse Client Fails

      Finesse runs in a web browser on the client desktop. If the browser process fails for any reason, the following occurs:
      • After a 10-second timeout, the Finesse Server Automatically instructs the CTI Server to sign the user out.
      • If the user is currently active on a call, the automatic sign-out occurs after the call is complete.
      • The user may restart the browser and sign in again to the Finesse server. The desktop is then updated to reflect the current state of the user.

      Scenario 6 - Network Failure Between Finesse Client and Finesse Server A

      In this scenario, the following events occur:
      • Finesse Server A automatically signs the user out of the CTI Server unless the user is active on a call.
      • The Finesse client detects the loss of connectivity and automatically redirects the browser to Finesse Server B.
      • The user is prompted for sign-in credentials.
      • The Finesse Desktop application loads from Server B, and the current state of the user is updated (including agent state and call state).

      Scenario 7: Network Failure Between Finesse Client and Finesse Server B

      This scenario mirrors scenario 6. The Finesse client automatically redirects the user to Finesse Server A to sign in again.

      Cisco Agent Desktop Considerations

      Cisco Agent Desktop failover scenarios

      Cisco Agent Desktop client applications are a client of CTI OS, which provides for automatic fail-over and redundancy for the Cisco Agent Desktop CTI connections. If the Unified Communications Manager Peripheral Gateway or CTI Server (CG) fails-over, the Cisco Agent Desktop client application displays a logged out state and automatically returns to a logged in state when an operational connection is established with the alternate Unified Communications Manager Peripheral Gateway or CTI Server (CG).  Consequently, the scenarios outlined in the CTI OS Considerations section apply.

      The Cisco Agent Desktop services (Enterprise, Chat, RASCAL, and so forth) can also be deployed redundantly to allow for fail-over of the core Cisco Agent Desktop components. The Cisco Agent Desktop client applications are aware of the redundant Cisco Agent Desktop services and automatically fail-over in the event of a Cisco Agent Desktop service process or hardware failure.

      The following services are active on only one side at a time:
      • Cisco Browser and IP Phone Agent Service
      • Cisco Chat Service
      • Cisco Enterprise Service
      • Cisco Licensing and Resource Manager Service
      • Cisco Recording and Statistics Service
      • Cisco Sync Service
      The following services are active on both sides at all times and are available to the Cisco Agent Desktop client applications as long as network connectivity is available:
      • Cisco LDAP Monitor Service
      • Cisco Recording & Playback Service
      • Cisco VoIP Monitor Service

      Active side Cisco License and Resource Manager service fails

      In this scenario, the following events occur:
      • The Cisco Agent Desktop services on Side A that are not always active go to an idle state.
      • The Cisco Agent Desktop services on Side B (idle) activate.
      • The Cisco Agent Desktop client applications recover to Side B.

      Active side Cisco Agent Desktop service fails twice in five minutes

      In this scenario, the following events occur:
      • The Cisco Agent Desktop services on Side A that are not always active go to an idle state.
      • The Cisco Agent Desktop services on Side B (idle) activate.
      • The Cisco Agent Desktop client applications recover to Side B.

      Active side Cisco Agent Desktop service down for three minutes

      In this scenario, the following events occur:
      • The Cisco Agent Desktop services on Side A that are not always active go to an idle state.
      • The Cisco Agent Desktop services on Side B (idle) activate.
      • The Cisco Agent Desktop client applications recover to Side B.

      Network failure between active side and idle side Cisco Agent Desktop service

      In this scenario, the following events occur:
      • The Cisco Agent Desktop services on Side A remain active.
      • The Cisco Agent Desktop services on Side B (idle) activate.
      • The Cisco Agent Desktop client applications remain connected to Cisco Agent Desktop services on Side A.
      • When network connectivity is restored between Sides A and B, the Cisco Licensing and Resource Manager Service renders inactive the non-preferred side.  Recovery side preference is configurable in Post Install.

      Cisco Agent Desktop Browser Edition and Unified IP Phone Agent

      Cisco Agent Desktop Browser Edition and Cisco Unified IP Phone Agent communicate with CTI Server through the Cisco Browser and Unified IP Phone Agent service. When launching the desktop, the agent may use the URL for either side as long as the desired side is accessible. Once launched, desktop automatically connects to the active side. When launching Unified IP Phone Agent, the agent must select the active side from the services menu on the phone. If the idle side is selected, the user receives an error informing them that the side selected is idle and to try the other side.

      Idle side Cisco Agent Desktop services become active after failure

      In this scenario, the following events occur for a logged-in desktop agent:
      • The desktop applet changes to a logged out state and the user is notified that the connection has been lost.
      • The desktop applet automatically connects to services on Side B and logs-in the agent.
      In this scenario, the following events occur for a logged-in Unified IP Phone agent:
      • The phone agent is notified that the connection to the server has been lost.
      • The phone agent manually selects Side B from their services list and logs in again.

      Replacement of MSDE with Flat FilesSQL Database Server

      As MSDE is no longer supported, at post install time the user can has to choose flat files or a SQL database. Post install configures their system based on their selection. There is a value stored in LDAP that indicates which implementation is selected. After the initial configuration is completed (the implementation is selected and saved), the user cannot change implementations. For the database implementation, Unified CCE configures the FCRasSvr database, as it does now. Unified CCE continues to provide scripts for setup and teardown of database replication for HA. There are three tables in the database implementation: agent state data, call log data, and recording metadata. In the flat-file implementation, each of these tables is represented by a set of text files. Regardless of which implementation is used in a particular installation, the data stored is identical. The data that is stored in the tables is identical.

      There are currently no minimum or maximum file sizes set.


      Note


      Starting release 9.0(3), all customers with new deployments of any version of Cisco Agent Desktop must use SQL Server as the data store, and not flat files. The rationale behind this change is that deployments with a fully replicated SQL Server database experience a more complete feature set, better performance, and stability.


      Unified CCE high availability with Unified ICM Enterprise

      In parent/child deployment, the Unified ICME acts as the parent controlling one or more Unified CCE child IP ACDs. In this model, the Unified ICME system is the network call routing engine for the contact centers, with network queuing using the Unified CVP and Unified CCE Gateway PGs to connect child Unified CCE systems (either Unified CCE with system PG or Unified CCX). The child Unified CCE systems are individual IP-ACD systems fully functional with local call processing in case they lose their WAN connection to the parent Unified ICME system. This configuration provides a high level of redundancy and availability to the Unified CCE solution to allow sites to remain functional as Unified CCE sites even if they are cut off from centralized call processing resources.

      Figure 24. Parent/Child Deployment Model

      Parent/Child components

      The following sections describe the components used in Unified ICME (parent) and Unified CCE (child) deployments.


      Note


      Precision Routing is not supported in a parent/child deployment.


      Unified ICME (parent) data center

      The Unified ICME data center location contains the Unified ICME Central Controller. The data center has a redundant pair of Central Controllers. A Central Controller has Call Router and Logger servers. These servers can be deployed as individual Call Routers and Loggers and they can be deployed in two different data centers to be geographically distributed for additional fault tolerance.

      The Unified ICME Central Controllers control Peripheral Gateways (PGs) at the data center location. A redundant pair of IVR PGs controls Unified CVP across the architecture. Additional PGs can be inserted at this layer to control TDM or legacy ACDs and IVRs, perhaps to support a migration to Unified CCE or to support out-source locations that still use the TDM or legacy ACDs. The Unified ICME parent at this level can also support standard pre-routing with inter-exchange carriers (IXCs) such as AT&T, MCI, and others, allowing Unified ICME to select the best target for the call while it is still in the carrier network.

      The Unified ICME parent is not designed to support any directly controlled agents in this model, which means that it does not support classic Unified CCE with a Unified CM PG installed on this Unified ICME parent. All agents must be controlled externally to this Unified ICME parent system.

      The Unified CVP or IVR PG pair controls the CVP Call Server which translates the IVR PG commands from Unified ICME into VoiceXML and directs the VoiceXML to the Voice Gateways (VGs) at the remote contact center sites. This allows calls from the data center location to come into the remote call centers under control of the Unified CVP at the parent location. The parent then has control over the entire network queue of calls across all sites and holds the calls in queue on the Voice Gateways at the sites until an agent becomes available.

      Unified CCX call center (child) site

      The Unified Contact Center Express (CCX) call center location contains a local Unified CM cluster that provides local IP-PBX functionality and call control for the IP phones and local Unified CVP VG. There is also a local Unified CCX server that provides IP-ACD functionality for the site. Prior to Unified CCX Server Release 8.0, the Unified CCX Server had the Unified CCE Gateway PG installed on it, which reduces the number of servers required to support this contact center site.

      Unified CCX 8.0(1) is deployed on the Unified Communications Manager Operating System platform requiring the Unified CCE Gateway PG to be installed on a separate (Windows) server. The deployment model changes for new and existing customers require the Unified CCE Gateway PG and the CCX ACMI Manager to be installed on separate (Windows) servers. In either of these deployments, the Unified CCE Gateway PG connects to the Unified ICME Call Router at the Unified ICME parent location over the WAN and provides real-time event data and agent states to the parent from the Unified CCX. The Unified CCE Gateway PG also captures configuration data (skill groups, CSQs, services, applications, and so forth) and sends it to the parent Unified ICME configuration database as well.

      Additional Unified CCX servers may be used and included in this site to provide redundant Unified CCX servers, historical reporting database services, recording and monitoring servers, and ASR/TTS servers. High-availability deployments of Unified CCX Release 8.0 or above require the deployment of two "Side A" Unified CCE Gateway PGs on separate (Windows) servers. The Unified CCX servers are configured with the IP Addresses of the two "Side A" Unified CCE Gateway PGs.

      Unified CCE call center (child) site

      The Unified CCE call center location contains a local Unified CM cluster that provides local IP-PBX functionality and call control for the IP phones and local Unified CVP VG. There is also a local Unified IP IVR to provide local call queuing for the Unified CCE site. There is a redundant pair of Unified CCE Gateway PGs that are used to connect this site to the Unified ICME parent Central Controller at the Unified ICME parent location over the WAN. The Unified CCE Gateway PGs can be deployed on separate servers or co-resident with the CCE System PG with the following caveats:

      If the Unified CCE Gateway PG and Unified CCE System PG Instance Numbers are the same, then the PG number for the Unified CCE Gateway PG and Unified CCE System PG must be different.

      If the Unified CCE Gateway PG and Unified CCE System PG Instance Numbers are different, then the PG number for Unified CCE Gateway PG and Unified CCE System PG may be the same.

      No additional PGs (such as VRU PG or MR PG) can be added to this Server.

      For scalability limits of the co-resident Unified CCE Gateway PG and Unified CCE System PG, refer to Sizing Unified CCE Components and Servers for additional details.

      The Unified CCE Gateway PGs provide real-time event data and agent states to the parent from the Unified CCE child. The Unified CCE Gateway PGs also capture configuration data (skill groups, services, call types, and so forth) and send it to the parent Unified ICME configuration database as well.

      The IP-IVR at the child site can be replaced with a local Unified CVP instance. Unified CVP is not integrated as part of the Agent Controller's System PG; there is a separate IVR PG defined specifically for Unified CVP as part of the installation for Unified CCE with Unified CVP. Because Unified CVP is not part of the System PG, calls in queue or treatment in Unified CVP are not reported to the parent Unified ICME through the Unified CCE Gateway PG.

      A local Unified CCE child system is used to provide IP-ACD functionality and it can be sized depending on the type of deployment required:
      • Progger configuration with a single server or duplex server that contains the following Unified CCE components: Call Router and Logger, System PG for Unified CM and IP IVR, CTI Server and CTI OS Server, and optionally the VRU PG for Unified CVP.
      • Rogger configuration with separate Unified CCE Agent Controller (System PG and optional Unified CVP controller and CTI/CTI OS Server) The Rogger configuration contains the Unified CCE components: Call Router and Logger as a single set of duplex Central Controllers, and a separate Agent Controller set of duplex servers that contain the System PG for Unified CM and IP IVR, CTI Server and CTI OS Server, and the optional VRU PG for Unified CVP.

      For more details about the capacity of these configurations, refer to Sizing Unified CCE Components and Servers.

      In either configuration, a separate Administration & Data Server is required to host the configuration and scripting tools for the system as well as an optional Historical Database Server role and Web based Unified Intelligence Center reporting tool.

      Unified CCE Gateway PGs at Unified ICME data center

      The Unified CCE Gateway PG may be deployed at the Unified ICME data center as illustrated in the following figure. Some advantages of this deployment model include centrally managing and controlling Unified CCE Gateway PGs. Another condition forcing the Unified CCE Gateway PG to be deployed at the Unified ICME data center is where the ownership and management of the child and parent sites are different. For example, an Outsourcer/Service Bureau manages child site and connect to the Unified CCE Gateway PGs at the parent site.

      Figure 25. Parent/Child deployment with Unified CCE Gateway PGs at data center

      There are several drawbacks with moving the Unified CCE Gateway PGs to the data center. One is specific to recovering reporting data in the event of a network failure. If the network connection between the parent site and the Unified CCE System PGs at the child drops, all reporting at the parent site is lost for that period.


      Note


      If the Unified CCE Gateway PG is deployed locally to the Unified CCE System PG and the connection between the Unified CCE Gateway PG and the parent site drops, the historical data in the parent site is updated when the network connection is restored.


      A second drawback with centralizing the Unified CCE Gateway PGs is that the network bandwidth requirements for the connections between the Unified CCE Gateway PG and the Unified CCE System PG are significantly higher. See the "Bandwidth Requirements for Unified CCE Gateway to System PG" section in the Bandwidth Provisioning and QoS Considerations chapter for additional details.

      Parent/Child call flows

      The following sections describe the call flows between the parent and child.

      Typical inbound PSTN call flow

      In a typical inbound call flow from the PSTN, calls are directed by the carrier network to the contact center sites using some predefined percent allocation or automatic routing method. These calls are terminated in the Unified CVP VGs at the call center locations under control of the Unified ICME parent Unified CVP.

      The inbound call flow is as follows:
      1. The call arrives on the Unified CVP VG at the Unified CCE call center location.
      2. The Unified CVP VG maps the call by dialed number to a particular Unified CVP Call Server at the Unified ICME parent site and sends a new call event to the Unified CVP Call Server.
      3. The Unified CVP Call Server sends the new call event message to the Unified CVP or IVR PG at the Unified ICME parent site.
      4. The Unified CVP PG sends the new call message to the Unified ICME parent, which uses the inbound dialed number to qualify a routing script to determine the proper call treatment (messaging) or agent groups to consider for the call.
      5. Unified ICME instructs Unified CVP to hold the call in the VG at the site and wait for an available agent while directing specific instructions to play .wav files for hold music to the caller in the gateway.
      6. When an agent becomes available, the Unified ICME instructs Unified CVP to transfer the call to the site with the available agent by using a translation route. (The agent might not be at the same physical site but across the WAN.) Any data collected about the call in the Unified ICME parent Unified CVP is transferred to the remote system's PG (either a TDM, legacy PG, or one of the Unified CCE Gateway PGs for Unified CCX or Unified CCE).
      7. When the call arrives at the targeted site, it arrives on a specific translation route DNIS that was selected by the Unified ICME parent. The PG at the child site is expecting a call to arrive on this DNIS to match up with any pre-call CTI data associated with the call. The local ACD or Unified CCE performs a post-route request to the local PG to request the CTI data as well as the final destination for the call (typically the lead number for the skill group of the available agent).
      8. If the agent is no longer available for the call (walked away or unplugged), Unified CVP at the parent site uses the Router Re-query function in the ICM Call Routing Script to select another target for the call automatically.

      Post-route call flow

      Post-routing is used when a call is already at a peripheral ACD or IVR and needs to be routed intelligently to another agent or location. If an agent gets a call in the ACD or Unified CCE that needs to be sent to a different skill group or location, the agent can make use of the post-route functionality to reroute the call.

      The post-route call flow is as follows:
      1. The agent transfers the call to the local CTI route point for reroute treatment using the CTI agent desktop.
      2. The reroute application or script makes a post-route request to the Unified ICME parent by using the local Unified CCE Gateway PG connection.
      3. The Unified ICME parent maps the CTI route point from Unified CCE as the dialed number and uses that number to select a routing script. This script will return a label or routing instruction that can move the call to another site, or to the same site but into a different skill group, or to a Unified CVP node for queuing.
      4. The Unified CCE receives the post-route response from the Unified ICME parent system and uses the returned routing label as a transfer number to send the call to the next destination.

      Parent/Child fault tolerance

      The parent/child model provides for fault tolerance to maintain a complete IP-ACD with either Unified CCX or Unified CCE deployed at the site, with local IP-PBX and call treatment and queuing functionality.

      Unified CCE child loses connection to Unified ICME parent

      If the WAN between the Unified CCE child site and the Unified ICME parent fails, the local Unified CCE system is isolated from the parent as well as the Unified CVP VG. Calls coming into the site will no longer get treatment from the Unified CVP under control of the Unified ICME parent, so the following functionality must be replicated locally, depending on the configuration at the child site.

      • For Unified CCE child configurations using local IP IVR resources for queue and treatment:
        • The local VG must have dial peer statements to pass control of the calls to the local Unified CM cluster if the parent Unified CVP Call Server cannot be reached. Also, the local Unified CM cluster must have CTI route points mapped to the inbound DNIS or dialed numbers that the local VG will present if the parent Unified CVP Call Server is not reached.
        • The local IP IVR must be configured with appropriate .wav files and applications that can be called by the Unified CCE child system locally to provide basic call treatment such as playing a welcome greeting or other message.
        • The child CCE Routing Script must handle queuing of calls for agents in local skill groups, instructing the IP IVR to play treatment in-queue while waiting for an agent.
        • Any data lookup or external CTI access that is normally provided by the parent Unified CVP or the parent Unified ICME must be provisioned locally to allow the agents to have full access to customer data for routing and screen pops.
        • Any post-routing transfer scripts will fail during this outage, so Unified CCE must be configured to handle this outage or prevent the post-route scripts from being accessed.
      • For Unified CCE child configurations using local Unified CVP resources for queue and treatment with Unified CCE 7.5(x):
        • The local VG must have dial peer statements to pass control of the calls to the local Unified CVP Call Server at the child site. Also, the inbound DNIS or dialed numbers that the local VG will present to the child Unified CVP must be configured in the child Unified CCE to process these calls locally at the child site.
        • The local VXML Gateways and Unified CVP Call Servers must be configured with appropriate .wav files and applications that can be called by the Unified CCE child system locally to provide basic call treatment such as playing a welcome greeting or other messages.
        • Self-service or Unified CVP Studio VXML applications normally provided by the parent Unified ICME must be replicated using Unified CVP VXML Server (web application server) at the child site to generate the dynamic VXML for these applications.
        • The child Unified CCE Routing Script must handle queuing of calls for agents in local skill groups, instructing the local Unified CVP at the child site to play treatment in-queue while waiting for an agent.
        • Any data lookup or external CTI access that is normally provided by the parent Unified CVP or the parent Unified ICME must be provisioned locally to allow the agents to have full access to customer data for call routing and screen pops.
        • Any post-routing transfer scripts will fail during this outage, so Unified CCE must be configured to handle this outage or prevent the post-route scripts from being accessed.

      Unified CCX child loses WAN to Unified ICME parent

      If the WAN between the Unified Contact Center Express (Unified CCX) child site and the Unified ICME parent fails, the local Unified CCX system is isolated from the parent as well as the Unified CVP VG. Calls coming into the site will no longer get treatment from the Unified CVP under control of the Unified ICM parent, so the following functionality must be replicated locally:
      • The local VG must have dial peer statements to pass control of the calls to the local Unified CM cluster if the parent Unified CVP Call Server cannot be reached.
      • Unified CCX JTAPI applications have to be mapped to these CTI route points to provide any typical inbound call treatment, such as playing a welcome greeting or other message.
      • The application has to provide for call queuing and treatment in queue while waiting for a local Contact Service Queue (CSQ) agent.
      • Any data lookup or external CTI access that is normally provided by the parent Unified CVP or the parent Unified ICME must be provisioned locally to allow the agents to have full access to customer data for call routing and screen pops.
      • Any post-routing applications or transfer scripts will fail during this outage, so the Unified CCX must be configured to handle this outage or prevent the post-route applications from being accessed.

      A similar failure occurs if the local Unified CVP ingress VGs controlled by the parent Unified CVP Call Server cannot see the parent Unified ICME CVP Call Servers. The local Unified CVP gateways are configured to fail-over to the local Unified CM (or child Unified CVP) to route calls to the Unified CCX agents as described above. Likewise, if the entire Unified ICME parent were to fail, the local VGs controlled by the parent Unified CVP at the sites would no longer have call control from the Unified ICME parent, and calls would forward to the local sites for processing.

      Unified CCE Gateway PG cannot connect to Unified ICME parent

      If the Unified CCE gateway PG fails or cannot communicate with the Unified ICME parent, the local agents are no longer seen as available to the Unified ICM parent, but the inbound calls to the site may still be under control of the Unified ICME parent Unified CVP. In this case, the Unified ICME parent will not know if the remote Unified CCE gateway PG has failed or if the actual Unified CCE IP-ACD has failed locally.

      The Unified ICME at the parent location can automatically route around this site, considering it down until the PG comes back online and reports agent states again. Alternatively, the Unified ICME can also direct a percentage of calls as blind transfers to the site Unified CCE or Unified CCX using the local inbound CTI route points on Unified CM. This method would present calls with no CTI data from Unified CVP, but it would allow the agents at the site to continue to get calls locally with their Unified CCE/CCX system.

      If the local Unified CCE or Unified CCX child system were to fail, the Unified CCE gateway PG cannot connect to it, and the Unified ICME parent then consider all of the agents to be off-line and not available. If calls were sent to the local Unified CM while the child Unified CCE or Unified CCX system was down, the call-forward-on-failure processing would take over the call for the CTI route point. This method would redirect the call to another site or an answering resource to play a message telling the caller there was an error and to call again later.

      Reporting and configuration impacts

      During any time that the Unified CCE child is disconnected from the Unified ICME parent, the local IP-ACD is still collecting reporting data and allows local users to make changes to the child routing scripts and configuration. The Unified CCE gateway PG at the child site will cache these objects and store them in memory (and eventually to disk) to be sent later to the Unified ICME parent when it is available. This functionality is available only if the Unified CCE gateway PG is co-located at the child Unified CCE site.

      Other high availability considerations

      Multichannel components such as Cisco Unified Web and E-mail Interaction Manager and Cisco Outbound Option may be installed only at the child Unified CCE level, not at the parent. They are treated as nodal implementations on a site-by-site basis.

      Other Considerations for High-availability

      A Unified CCE fail-over can affect other parts of the solution. Although Unified CCE may stay up and running, some data can be lost during its fail-over, or other products that depend on Unified CCE to function properly might not be able to handle a Unified CCE fail-over. This section examines what happens to other critical areas in the Unified CCE solution during and after fail-over.

      Reporting

      The Unified CCE reporting feature uses real-time, five-minute and reporting interval (15 or 30 minute) data to build its reporting database. Therefore, at the end of each five-minute and reporting interval (15 or 30 minute), each Peripheral Gateway will gather the data it has kept locally and send it to the Call Routers. The Call Routers process the data and send it to their local Logger for historical data storage. That data is then replicated to the HDS database from the Logger as it is written to the Logger database.

      The Peripheral Gateways provide buffering (in memory and on disk) of the five-minute and reporting interval (15 or 30 minute) data collected by the system to handle network connectivity failures or slow network response as well as automatic retransmission of data when the network service is restored. However, physical failure of both Peripheral Gateways in a redundant pair can result in loss of the half-hour or five-minute data that has not been transmitted to the Central Controller. Use redundant Peripheral Gateways to reduce the chance of losing both physical hardware devices and their associated data during an outage window.

      When agents log out, all their reporting statistics stop. The next time the agents log in, their real-time statistics start from zero. Typically, Central Controller fail-over does not force the agents to log out or reset their statistics; however, if the PG fails-over, their agent statistics are reset because the PIM and OPC processes that maintain these values in memory are restarted. If the CTI OS or CAD servers do not fail-over or restart, the agent desktop functionality is restored to its pre-fail-over state.

      For further information, see the Reporting Guide for Cisco IPCC Enterprise & Hosted Editions.