Cisco IP Contact Center Enterprise Edition Releases 5.0 and 6.0 Solution Reference Network Design (SRND)
Design Considerations for High Availability
Downloads: This chapterpdf (PDF - 845.0KB) The complete bookPDF (PDF - 3.26MB) | Feedback

Design Considerations for High Availability

Table Of Contents

Design Considerations for High Availability

Designing for High Availability

Data Network Design Considerations

Cisco CallManager and CTI Manager Design Considerations

Configuring ICM for CTI Manager Redundancy

IP IVR (CRS) Design Considerations

IP IVR (CRS) High Availability Using Cisco CallManager

IP IVR (CRS) High Availability Using ICM

Internet Service Node (ISN) Design Considerations

Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)

Cisco Email Manager Option

Cisco Collaboration Server Option

Cisco IPCC Outbound Option Design Considerations

Peripheral Gateway Design Considerations

Cisco CallManager Failure Scenarios

ICM Failover Scenarios

Scenario 1 - Cisco CallManager and CTI Manager Fail

Scenario 2 - Cisco CallManager PG Side A Fails

Scenario 3 - Only Cisco CallManager Fails

Scenario 4 - Only CTI Manager Fails

IPCC Scenarios for Clustering over the WAN

Scenario 1 - ICM Central Controller or Peripheral Gateway Private Network Fails

Scenario 2 - Visible Network Fails

Scenario 3 - Visible and Private Networks Both Fail (Dual Failure)

Scenario 4 - Remote Agent Location WAN Fails

Understanding Failure Recovery

Cisco CallManager Service

IP IVR (CRS)

ICM

Cisco CallManager PG and CTI Manager Service

ICM Voice Response Unit PG

ICM Call Router and Logger

Admin Workstation Real-Time Distributor (RTD)

CTI Server

CTI OS Considerations

Cisco Agent Desktop Considerations

Other Considerations


Design Considerations for High Availability


This chapter covers several possible IPCC failover scenarios and explains design considerations for providing high availability of system functions and features in each of those scenarios. This chapter contains the following sections:

Designing for High Availability

Data Network Design Considerations

Cisco CallManager and CTI Manager Design Considerations

IP IVR (CRS) Design Considerations

Internet Service Node (ISN) Design Considerations

Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)

Cisco Email Manager Option

Cisco Collaboration Server Option

Cisco IPCC Outbound Option Design Considerations

Peripheral Gateway Design Considerations

Understanding Failure Recovery

CTI OS Considerations

Cisco Agent Desktop Considerations

Other Considerations

Designing for High Availability

Cisco IPCC is a distributed solution that uses numerous hardware and software components, and it is important to design each deployment in such a way that a failure will impact the fewest resources in the call center. The type and number of resources impacted will depend on how stringent your requirements are and which design characteristics you choose for the various IPCC components, including the network infrastructure. A good IPCC design will be tolerant of most failures (defined later in this section), but not all failures can be made transparent.

Cisco IPCC is a sophisticated solution designed for mission-critical call centers. The success of any IPCC deployment requires a team with experience in data and voice internetworking, system administration, and IPCC application configuration.

Before implementing IPCC, use careful preparation and design planning to avoid costly upgrades or maintenance later in the deployment cycle. Always design for the worst possible failure scenario, with future scalability in mind for all IPCC sites.

In summary, plan ahead and follow all the design guidelines and recommendations presented in this guide and in the Cisco IP Telephony Solution Reference Network Design (SRND) guide, available at

http://www.cisco.com/go/srnd

For assistance in planning and designing your IPCC solution, consult your Cisco or certified Partner Systems Engineer (SE).

Figure 3-1 shows a high-level design for a fault-tolerant IPCC single-site deployment.

Figure 3-1 IPCC Single-Site Design for High Availability

In Figure 3-1, each component in the IPCC site is duplicated for redundancy and connected to all of its primary and backup servers, with the exception of the intermediate distribution frame (IDF) switch for the IPCC agents and their phones. The IDF switches do not interconnect with each other, but only with the main distribution frame (MDF) switches, because it is better to distribute the agents among different IDF switches for load balancing and for geographic separation (for example, different building floors or different cities). If an IDF switch fails, all calls should be routed to other available agents in a separate IDF switch or to an IP IVR (CRS) queue. Follow the design recommendations for a single-site deployment as documented in the Cisco IP Telephony Solution Reference Network Design (SRND) guide, available at

http://www.cisco.com/go/srnd

If designed correctly for high availability and load balancing, any IPCC site can lose half of its systems and still be operational. With this type of design, no matter what happens in the IPCC site, each call should be handled in one of the following ways:

Routed and answered by an available IPCC agent

Sent to an available IP IVR (CRS) or ISN port

Answered by the Cisco CallManager AutoAttendant

Prompted by an IP IVR (CRS) or ISN announcement that the call center is currently experiencing technical difficulties, and to call back later

The components in Figure 3-1 can be rearranged to form two connected IPCC sites, as illustrated in Figure 3-2.

Figure 3-2 IPCC Single-Site Redundancy

Figure 3-2 emphasizes the redundancy of the single site design in Figure 3-1. Side A and Side B are basically mirror images of each other. In fact, one of the main IPCC features to enhance high availability is its simple mechanism for converting a site from non-redundant to redundant. To implement IPCC high availability, all you have to do is to duplicate the first side and cross-connect all the corresponding parts.

The following sections use Figure 3-1 as the model design to discuss issues and features that you should consider when designing IPCC for high availability. These sections use a bottom-up model (from a network model perspective, starting with the physical layer first) that divides the design into segments that can be deployed in separate stages.

Cisco recommends using only duplex (redundant) Cisco CallManager, IP-IVR/ISN, and ICM configurations for all IPCC deployments that require high availability. This chapter assumes that the IPCC failover feature is a critical requirement for all deployments, and therefore it presents only deployments that use a redundant (duplex) Cisco CallManager configuration, with each Cisco CallManager cluster having at least one publisher and one subscriber. Additionally, where possible, deployments should follow the best practice of having no devices, call processing, or CTI Manager services running on the Cisco CallManager publisher.

Data Network Design Considerations

The IPCC design shown in Figure 3-3 starts from a time division multiplexing (TDM) call access point and ends where the call reaches an IPCC agent. The bottom of the network infrastructure in the design supports the IPCC environment for data and voice traffic. The network, including the PSTN, is the foundation for the IPCC solution. If the network is poorly design to handle failures, then everything in the call center is prone to failure because all the servers and network devices depend on the network for communication. Therefore, the data and voice networks must be a primary part of your solution design and the first stage for all IPCC implementations.

In addition, the choice of voice gateways for a deployment is critical because some protocols offer more call resiliency than others. This chapter focuses on how the voice gateways should be configured for high availability with the Cisco CallManager cluster(s).

For more information on voice gateways and voice networks in general, refer to the Cisco IP Telephony Solution Reference Network Design (SRND) guide, available at

http://www.cisco.com/go/srnd

Figure 3-3 High Availability in a Network with Two Voice Gateways and One Cisco CallManager Cluster

Using multiple voice gateways avoids the problem of a single gateway failure causing blockage of all calls. In a configuration with two voice gateways and one Cisco CallManager cluster, each gateway should register with a different primary Cisco CallManager to spread the workload across the Cisco CallManagers in the cluster. Each gateway should use the other Cisco CallManager as a backup in case its primary Cisco CallManager fails. Refer to the Cisco IP Telephony Solution Reference Network Design (SRND) guide (available at http://www.cisco.com/go/srnd) for details on setting up Cisco CallManager redundancy groups for backup.

When calculating the number of trunks from the PSTN, be sure enough trunks are available to handle the maximum busy hour call attempts (BHCA) when one or more voice gateways fail. During the design phase, first decide how many simultaneous voice gateway failures are acceptable for the site. Based upon this requirement, the number of voice gateways used, and the distribution of trunks across those voice gateways, you can determine the number of trunks required. The more you distribute the trunks over multiple voice gateways, the fewer trunks you will need. However, using more voice gateways will increase the cost of that component of the solution, so you should compare the annual operating cost of the trunks (paid to the PSTN provider) against the one-time fixed cost of the voice gateways.

For example, assume the call center has a maximum BHCA that results in the need for four T1 lines, and the company has a requirement for no call blockage in the event of a single component (voice gateway) failure. If two voice gateways are deployed in this case, then each voice gateway should be provisioned with four T1 lines (total of eight). If three voice gateways are deployed, then two T1 lines per voice gateway (total of six) would be enough to achieve the same level of availability. If five voice gateways are deployed, then one T1 per voice gateway (total of five) would be enough to achieve the same level of availability. Thus, you can reduce the number of T1 lines required by adding more voice gateways.

The operational cost savings of fewer T1 lines may be greater than the one-time capital cost of additional voice gateways. In addition to the recurring operational costs of the T1 lines, you should also factor in the one-time installation cost of the T1 lines to ensure that your design accounts for the most cost-effective solution. Every installation has different availability requirements and cost metrics, but using multiple voice gateways is often more cost-effective. Therefore, it is a worthwhile design practice to perform this cost comparison.

After you have determined the number of trunks needed, the PSTN service provider has to configure them in such a way that calls can be terminated onto trunks connected to all of the voice gateways (or at least more than one voice gateway). From the PSTN perspective, if the trunks going to the multiple voice gateways are configured as a single large trunk group, then all calls will automatically be routed to the surviving voice gateways when one voice gateway fails. If all of the trunks are not grouped into a single trunk group within the PSTN, then you must ensure that PSTN re-routing or overflow routing to the other trunk groups is configured for all dialed numbers.

If a voice gateway with a digital interface (T1 or E1) fails, then the PSTN automatically stops sending calls to that voice gateway because the carrier level signaling on the digital circuit has dropped. Loss of carrier level signaling causes the PSTN to busy out all trunks on that digital circuit, thus preventing the PSTN from routing new calls to the failed voice gateway. When the failed voice gateway comes back on-line and the circuits are back in operation, the PSTN automatically starts delivering calls to that voice gateway again.

Because the voice gateways register with a primary Cisco CallManager, an increase in the amount of traffic on a given voice gateway will result in more traffic being handled by its primary Cisco CallManager. Therefore, when sizing the Cisco CallManager servers, plan for the possible failure of a voice gateway and calculate the maximum number of trunks that may be in use on the remaining voice gateways registered with each CallManager server.

With standalone voice gateways, it is possible that the voice gateway itself is operational but that its communication paths to the Cisco CallManager servers are severed (for example, a failed Ethernet connection). If this occurs in the case of a H.323 gateway, you can use the busyout-monitor interface command to monitor the Ethernet interfaces on a voice gateway. To place a voice port into a busyout monitor state, use the busyout-monitor interface voice-port configuration command. To remove the busyout monitor state on the voice port, use the no form of this command.

When the voice gateway interface to the switch fails, the voice gateway automatically busies out all its trunks. This prevents new calls from being routed to this voice gateway from the PSTN. Calls in progress do not survive because the Real-Time Transport Protocol (RTP) stream connection no longer exists. Parties at both ends of the line receive silence and, after a configurable timeout, calls are cleared. You can set the Transmission Control Protocol (TCP) timeout parameter in the voice gateway, and you can also set a default timeout in Cisco CallManager. The calls are cleared by whichever timeout expires first. When the voice gateway interface to the switch recovers, the trunks are automatically idled and the PSTN should begin routing calls to this voice gateway again (assuming the PSTN has not permanently busied out those trunks).

Cisco CallManager and CTI Manager Design Considerations

Cisco CallManager Release 3.3(x) and later uses CTI Manager, a service that acts as an application broker and abstracts the physical binding of the application to a particular Cisco CallManager server to handle all its CTI resources. (Refer to the Cisco IP Telephony Solution Reference Network Design (SRND) guide for further details about the architecture of the CTI Manager.) The CTI Manager and Cisco CallManager are two separate services running on a Cisco CallManager server. Some other services running on a Cisco CallManager server include TFTP, Cisco Messaging Interface, and Real-time Information Server (RIS) data collector services.

The main function of the CTI Manager is to accept messages from external CTI applications and send them to the appropriate resource in the Cisco CallManager cluster. The CTI Manager uses the Cisco JTAPI link to communicate with the applications. It acts like a JTAPI messaging router. The JTAPI client library in Cisco CallManager Release 3.1 and above connects to the CTI Manager instead of connecting directly to the Cisco CallManager service, as in releases prior to 3.1. In addition, there can be multiple CTI Manager services running on different Cisco CallManager servers in the cluster that are aware of each other (via the Cisco CallManager service, which is explained later in this section). The CTI Manager uses the same Signal Distribution Layer (SDL) signaling mechanism that the Cisco CallManager services in the cluster use to communicate with each other. However, the CTI Manager does not directly communicate with the other CTI Managers in its cluster. (This is also explained later in detail.)

The main function of the Cisco CallManager service is to register and monitor all the IP telephony devices. It basically acts as a switch for all the IP telephony resources and devices in the system, while the CTI Manager service acts as a router for all the CTI application requests for the system devices. Some of the devices that can be controlled by JTAPI that register with the Cisco CallManager service include the IP phones, CTI ports, and CTI route points.

Figure 3-4 illustrates some of the functions of Cisco CallManager and the CTI Manager.

Figure 3-4 Functions of Cisco CallManager and the CTI Manager

The servers in a Cisco CallManager cluster communicate with each other using the Signal Distribution Layer (SDL) service. SDL signaling is used only by the Cisco CallManager service to talk to the other Cisco CallManager services to make sure everything is in sync within the Cisco CallManager cluster. The CTI Managers in the cluster are completely independent and do not establish a direct connection with each other. CTI Managers route only the external CTI application requests to the appropriate devices serviced by the local Cisco CallManager service on this subscriber. If the device is not resident on its local Cisco CallManager subscriber, then the Cisco CallManager service forwards the application request to the appropriate Cisco CallManager in the cluster. Figure 3-5 shows the flow of a device request to another Cisco CallManager in the cluster.

Figure 3-5 CTI Manager Device Request to a Remote Cisco CallManager

It is important to load-balance devices and CTI applications evenly across all the nodes in the Cisco CallManager cluster.

The external CTI applications use a JTAPI user account on the CTI Manager to establish a connection and assume control of the Cisco CallManager devices registered to this JTAPI user. In addition, given that the CTI Managers are independent from each other, any CTI application can connect to any CTI Manager to perform its requests. However, because the CTI Managers are independent, one CTI Manager cannot pass the CTI application to another CTI Manager upon failure. If the first CTI Manager fails, the external CTI application must implement the failover mechanism to connect to another CTI Manager in the cluster. For example, the Voice Response Unit (VRU) Peripheral Gateway (PG) allows the administrator to input two CTI Managers, primary and secondary, in its JTAPI subsystem. The Cisco CallManager PG handles failover for the CTI Manager by using its two sides, sides A and B, which both log into the same JTAPI user upon initialization of the two CTI Managers. However, only one Cisco CallManager PG side allows the JTAPI user to register and monitor the user devices to conserve system resources in the Cisco CallManager cluster. The other side of the Cisco CallManager and VRU PG stays in hot-standby mode, waiting to be activated immediately upon failure of the active side.

The CTI applications can use the same JTAPI user multiple times to log into separate CTI Managers. This feature allows you to load-balance the CTI application connections across the cluster, and it adds an extra layer of failover and redundancy at the CTI application level by allowing multiple connections to separate CTI Managers while using the same JTAPI user to maintain control. However, keep in mind that every time a JTAPI connection is established with a CTI Manager (JTAPI user logs into a CTI Manager), the server CPU and memory usage will increase because the CTI application registers and monitors events on all the devices associated with the JTAPI user. Therefore, make sure to allocate the CTI application devices so that they are local to the CTI Manager where the application is connected. (See Figure 3-6.)

Figure 3-6 CTI Application Device Registration

Figure 3-6 shows two external CTI applications using the CTI Manager, the Cisco CallManager PG, and the IP IVR (CRS). The Cisco CallManager PG logs into the CTI Manager using the JTAPI account User 1, while IP IVR (CRS) uses User 2. Each subscriber has two phones to load-balance the calls, and each server has one JTAPI connection to load-balance the CTI applications.

To avoid overloading the available resources, it is best to load-balance devices (phones, gateways, ports, CTI Route Points, CTI Ports, and so forth) and CTI applications evenly across all the nodes in the Cisco CallManager cluster.

Cisco CallManager and CTI Manager design should be the second design stage, right after the network design stage, and deployment should occur in this same order. The reason is that the IP telephony infrastructure must be in place to dial and receive calls using its devices before you can deploy any telephony applications. Before moving to the next design stage, make sure that a PSTN phone can call an IP phone and that this same IP phone can dial out to a PSTN phone, with all the call survivability capabilities considered for treating these calls. Also keep in mind that the Cisco CallManager cluster is the heart of the IPCC system, and any server failure in a cluster will take down two services (CTI and Cisco CallManager), thereby adding extra load to the remaining servers in the cluster.

Distribute Cisco CallManager devices (phones, CTI ports, and CTI route points) evenly across all Cisco CallManagers. Also be sure that all servers can handle the load for the worst-case scenarios, where they are the only remaining server in their cluster. For more information on how to load-balance the Cisco CallManager clusters, refer to the Cisco IP Telephony Solution Reference Network Design (SRND) guide, available at

http://www.cisco.com/go/srnd

Configuring ICM for CTI Manager Redundancy

To enable Cisco CallManager support for CTI Manager failover in a duplex Cisco CallManager model, perform the following steps:


Step 1 Create a Cisco CallManager redundancy group, and add subscribers to the group. (Publishers and TFTP servers should not be used for call processing, device registration, or CTI Manager use.)

Step 2 Designate two CTI Managers to be used for each side of the duplex Peripheral Gateway (PG).

Step 3 Assign one of the CTI Managers to be the JTAPI service of the Cisco CallManager PG side A. (See Figure 3-7.)

Step 4 Assign the remaining CTI Manager to be the JTAPI service of the Cisco CallManager PG side B. (See Figure 3-7.)


Figure 3-7 Assigning CTI Managers for PG Sides A and B

IP IVR (CRS) Design Considerations

The JTAPI subsystem in IP IVR (CRS) can establish connections with two CTI Managers. This feature enables IPCC designs to add IP IVR redundancy at the CTI Manager level in addition to using the ICM script to check for the availability of IP IVR before sending a call to it. Load balancing is highly recommended to ensure that all IP IVRs are used in the most efficient way.

Figure 3-8 shows two IP IVR (CRS) servers configured for redundancy within one Cisco CallManager cluster. The IP IVR group should be configured so that each server is connected to a different CTI Manager service on different Cisco CallManager subscribers in the cluster for load balancing and high availability. Using the redundancy feature of the JTAPI subsystem in the IP IVR server, you can implement redundancy by adding the IP addresses or host names of two Cisco CallManagers from the cluster. Then, if one of the Cisco CallManagers fails, the IP IVR associated with that particular Cisco CallManager will fail-over to the second Cisco CallManager.

Figure 3-8 High Availability with Two IP IVR Servers and One Cisco CallManager Cluster

You can increase IP IVR (CRS) availability by using one of the following optional methods:

Call-forward-busy and call-forward-on-error features in Cisco CallManager. This method is more complicated, and Cisco recommends it only for special cases where a few critical CTI route points and CTI ports absolutely must have high availability down to the call processing level in Cisco CallManager. For more information on this method, see IP IVR (CRS) High Availability Using Cisco CallManager.

ICM script features to check the availability of an IP IVR prior to sending a call to it. For more information on this method, see IP IVR (CRS) High Availability Using ICM.


Note Do not confuse the IP IVR (CRS) subsystems with services. IP IVR uses only one service, the Cisco Application Engine service. The IP IVR subsystems are connections to external applications such as the CTI Manager and ICM.


IP IVR (CRS) High Availability Using Cisco CallManager

You can implement IP IVR (CRS) port high availability by using any of the following call forward features in Cisco CallManager:

Forward Busy — forwards calls to another port or route point when Cisco CallManager detects that the port is busy. This feature can be used to forward calls to another CTI port when an IP IVR CTI port is busy due to an IP IVR application problem, such as running out of available CTI ports.

Forward No Answer — forwards calls to another port or route point when Cisco CallManager detects that a port has not picked up a call within the timeout period set in Cisco CallManager. This feature can be used to forward calls to another CTI port when an IP IVR CTI port is not answering due to an IP IVR application problem.

Forward on Failure — forwards calls to another port or route point when Cisco CallManager detects a port failure caused by an application error. This feature can be used to forward calls to another CTI port when an IP IVR CTI port is busy due to a Cisco CallManager application error.


Note When using the call forwarding features to implement high availability of IP IVR ports, avoid creating a loop in the event that all the IP IVR servers are unavailable. Basically, do not establish a path back to the first CTI port that initiated the call forwarding.


IP IVR (CRS) High Availability Using ICM

You can implement IP IVR (CRS) high availability through ICM scripts. You can prevent calls from queuing to an inactive IP IVR by using the ICM scripts to check the IP IVR Peripheral Status before sending the calls to it. For example, you can program an ICM script to check if the IP IVR is active by using an IF node or by configuring a Translation Route to the Voice Response Unit (VRU) node (by using the consider if field). This method can be modified to load-balance ports across multiple IP IVRs, and it is easily scalable to virtually any number of IP IVRs.


Note All calls at the IP IVR are dropped if the IP IVR server, IVR-to-CallManager JTAPI link, or the IP IVR PG fails.


Internet Service Node (ISN) Design Considerations

The Internet Service Node (ISN) may be deployed with IPCC as an alternative to IP IVR for call treatment and queueing. ISN is different from IP IVR in that it does not rely on Cisco CallManager for JTAPI call control. ISN uses H.323 for call control and is used "in front of" Cisco CallManager or other PBX systems as part of a hybrid IPCC or migration solution. (See Figure 3-9.)

Figure 3-9 High Availability with Two ISN Servers

ISN uses the following system components:

Cisco Voice Gateway

The Cisco Voice Gateway is typically used to terminate TDM PSTN trunks and calls to transform them into IP-based calls on an IP network. With ISN, these voice gateways also provide additional functionality using the IOS built-in Voice Extensible Markup Language (VXML) Voice Browser to provide caller treatment and call queueing on the voice gateway without having to move the call to a physical IP IVR. The gateway can also use the Media Resource Control Protocol (MRCP) interface to add Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) functions on the gateway as well under ISN control.

ISN Voice Browser

The ISN Voice Browser is used in conjunction with the VXML Voice Browser on the voice gateway to provide call control signalling when calls are switched between the ingress gateway and another endpoint gateway or IPCC agent. The voice browsers register with the application servers as well as gatekeepers so that, when new calls come into the system, the gatekeeper can associate the dialed number with a particular set of voice browsers that can handle the call.

ISN Application Server

The ISN Application Server controls the voice browsers (both VXML Voice Browsers on gateways and ISN Voice Browsers) and interfaces to the ICM Peripheral Gateway to obtain instructions and pass data to the ICM routing script. Instructions from the ICM Peripheral Gateway are translated by the Application Server to VXML code and sent to the voice browsers for processing.

ISN Media Server

The ISN caller treatment is provided either by using ASR/TTS functions via MRCP or with predefined .wav files stored on media servers. The Media Servers act as web servers and serve up the .wav files to the voice browsers as part of their VXML processing. Media Servers can be clustered using the Cisco Content Services Switch (CSS) products, thus allowing multiple Media Servers to be pooled behind a single URL for access by all the voice browsers in the network.

H.323 Gatekeepers

Gatekeepers are used with ISN to register the voice browsers and associate them with specific dialed numbers. When calls come into the network, the gateway will query the gatekeeper to find out where to send the call based upon the dialed number. The gatekeeper also is aware of the state of the voice browsers and will load-balance calls across them and avoid sending calls to out-of-service voice browsers or ones that have no available sessions.

Availability of the ISN can be increased by:

Adding additional redundant ISN systems under control of the ICM Peripheral gateways, thus allowing the ICM to balance the calls across the platforms

Adding additional ISN components to an ISN system (for example, a single ISN with multiple voice browsers)

Adding gatekeeper redundancy with HSRP

Adding Cisco Content Server to load-balance .wav file requests across multiple ISN Media Servers


Note Calls in ISN are not dropped if the Application Server or ISN PG fails because they can be redirected to another ISN Voice Browser on another ISN controlled gateway as part of the fault-tolerant design using TCL scripts in the voice gateway that are provided with the ISN images.


For more information on these options, review the ISN product documentation at:

http://www.cisco.com/univercd/cc/td/doc/product/icm/isn/isn21/index.htm

Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)

The IPCC Enterprise solution can be extended to support multi-channel customer contacts, with email and web contacts being routed by the IPCC to agents in a blended or "universal queue" mode. The following optional components are integrated into the IPCC architecture (see Figure 3-10):

Media Routing Peripheral Gateway

To route multi-channel contacts, the Cisco e-Mail Manager and Cisco Collaboration Server Media Blender communicate with the Media Routing Peripheral Gateway. The Media Routing Peripheral Gateway, like any peripheral gateway, can be deployed in a redundant or duplex manner with two servers interconnected for high availability. Typically, the Media Routing Peripheral Gateway is co-located at the Central Controller and has an IP-socket connection to the multi-channel systems.

Admin Workstation ConAPI Interface

The integration of the Cisco multi-channel options allows for the ICM and optional systems to share configuration information about agents and their related skill groups. The Configuration Application Programming Interface (ConAPI) runs on an Admin Workstation and can be configured with a backup service running on another Admin Workstation.

Agent Reporting and Management (ARM) and Task Event Services (TES) Connections

ARM and TES services provide call (ARM) and non-voice (TES) state and event notification from the IPCC CTI Server to the multi-channel systems. These connections provide agent information to the email and web environments as well as accepting and processing task requests from them. The connection is a TCP/IP socket that connects to the agent's associated CTI Server, which can be deployed as a redundant or duplex pair on the Agent Peripheral Gateway.

Figure 3-10 Multi-Channel System

Recommendations for high availability:

Deploy the Media Routing Peripheral Gateways in duplex pairs.

Deploy ConAPI as a redundant pair of Admin Workstations that are not used for configuration and scripting so that they will be less likely to be shut off or rebooted. Also consider using the HDS servers at the central sites to host this function.

Deploy the IPCC Agent Peripheral Gateways and CTI Servers in duplex pairs.

Cisco Email Manager Option

The Cisco Email Manager is integrated with the IPCC Enterprise Edition to provide full email support in the multi-channel contact center with IPCC. It can be deployed using a single server (see Figure 3-11) for a small deployments or with multiple servers to meet larger system design requirements. The major components of Cisco Email Manager are:

Cisco Email Manager Server — The core routing and control server; it is not redundant.

Cisco Email Manager Database Server — The server that maintains the online database of all emails and configuration and routing rules in the system. It can be co-resident on the Cisco Email Manager server for smaller deployments or on a dedicated server for larger systems.

Cisco Email Manager UI Server — This server allows the agent user interface (UI) components to be off-loaded from the main Cisco Email Manager server to scale for larger deployments or to support multiple remote agent sites. Each remote site could have a local UI Server to reduce the data traffic from the agent browser clients to the Cisco Email Manager server (see Figure 3-12).

Figure 3-11 Single Cisco Email Manager Server

Figure 3-12 Multiple UI Servers

Cisco Collaboration Server Option

The Cisco Collaboration Server is integrated with the IPCC Enterprise Edition to provide web chat and co-browsing support in the multi-channel contact center with IPCC. The major components of the Cisco Collaboration Server are (see Figure 3-13):

Cisco Collaboration Server — Collaboration servers are deployed outside the corporate firewall in a demilitarized zone (DMZ) with the corporate web servers they support. You can deploy multiple collaboration servers for larger systems.

Cisco Collaboration Server Database Server — This is the server that maintains the online database of all chat and browsing sessions as well as configuration and routing rules in the system. It can be co-resident on the Cisco Collaboration Server; however, because the Cisco Collaboration Server is outside the firewall, most enterprises deploy it on a separate server inside the firewall to protect the historical data in the database. Multiple Cisco Collaboration Servers can point to the same database server to reduce the total number of servers required for the solution.

Cisco Collaboration Server Media Blender — This server polls the collaboration servers to check for new requests, and it manages the Media Routing and CTI/Task interfaces to connect the agent and caller. Each IPCC Agent Peripheral Gateway will have its own Media Blender, and each Media Blender will have a Media Routing peripheral interface manager (PIM) component on the Media Routing Peripheral Gateway.

Cisco Collaboration Dynamic Content Adaptor (DCA) — This server is deployed in the DMZ with the collaboration server, and it allows the system to share content that is generated dynamically by programs on the web site (as opposed to static HTTP pages).

Figure 3-13 Cisco Collaboration Server

Cisco IPCC Outbound Option Design Considerations

The Outbound Option provides the ability for IPCC Enterprise to place calls on behalf of agents to customers based upon a predefined campaign. The major components of the Outbound Option are (see Figure 3-14):

Outbound Option Campaign Manager — A software module that manages the dialing lists and rules associated with the calls to be placed. This software is loaded on the Logger platform and is not redundant; it can be loaded and active on only one server of the duplex pair of Loggers in the IPCC system.

Outbound Option Dialer — A software module that performs the dialing tasks on behalf of the Campaign Manager. In IPCC, the Dialer emulates a set of IP phones for Cisco CallManager to make the outbound calls, and it detects the caller and manages the interaction tasks with the CTI OS server to transfer the call to an agent. It also interfaces with the Media Routing Peripheral Gateway, and each Dialer has its own peripheral interface manager (PIM) on the Media Routing Peripheral Gateway.

Figure 3-14 IPCC Outbound Option

The system can support multiple dialers across the enterprise, all of which are under control of the central Campaign Manager software. Although they do not function as a redundant or duplex pair the way a Peripheral Gateway does, with a pair of dialers under control of the Campaign Manager, a failure of one of the dialers can be handled automatically and calls will continue to be handled by the surviving dialer. Any calls that were already connected to agents would remain connected and would experience no impact from the failure.

For smaller implementations, the Dialer could be co-resident on the IPCC Peripheral Gateway. For larger systems, the Dialer should be on its own server, or you could possibly use multiple Dialers under control of the central Campaign Manager.

Recommendations for high availability:

Deploy the Media Routing Peripheral Gateways in duplex pairs.

Deploy Dialers on their own servers as standalone devices to eliminate a single point of failure. (If they were co-resident on a PG, the dialer would go down if the PG server failed.)

Deploy multiple Dialers and make use of them in the Campaign Manager to allow for automatic fault recovery to a second Dialer in the event of a failure.

Include Dialer "phones" (virtual phones in Cisco CallManager) in redundancy groups in Cisco CallManager to allow them to fail-over to a different subscriber, as would any other phone or device in the Cisco CallManager cluster.

Peripheral Gateway Design Considerations

The ICM CallManager Peripheral Gateway (PG) uses the Cisco CallManager CTI Manager process to communicate with the Cisco CallManager cluster, with a single Peripheral Interface Manager (PIM) controlling agent phones anywhere on the cluster. The Peripheral Gateway PIM process registers with CTI Manager on one of the Cisco CallManager servers in the cluster, and the CTI Manager accepts all JTAPI requests from the PG for the cluster. If the phone, route point, or other device being controlled by the PG is not registered to that specific Cisco CallManager server in the cluster, the CTI Manager forwards that request via Cisco CallManager SDL links to the other Cisco CallManager servers in the cluster. There is no need for a PG to connect to multiple Cisco CallManager servers in a cluster.

Duplex Cisco CallManager PG implementations are highly recommended because the PG will have only one connection to the Cisco CallManager cluster using a single CTI Manager. If that CTI Manager were to fail, the PG would no long be able to communicate with the Cisco CallManager cluster. Adding a redundant or duplex PG allows the ICM to have a second pathway or connection to the Cisco CallManager cluster using a second CTI Manager process on a different Cisco CallManager server in the cluster.

The minimum requirement for ICM high-availability support for CTI Manager and IP IVR (CRS) is a duplex (redundant) Cisco CallManager PG environment with one Cisco CallManager cluster containing at least two servers. Therefore, the minimum configuration for a Cisco CallManager cluster in this case is one publisher and one subscriber. (See Figure 3-15.)

Figure 3-15 ICM High Availability with One Cisco CallManager Cluster

Redundant ICM servers can be located at the same physical site or geographically distributed. In both cases, the ICM Call Router and Logger/Database Server processes are interconnected through a private, dedicated LAN. If the servers are located at the same site, you can provide the private LAN by inserting a second NIC card in each server (sides A and B) and connecting them with a crossover cable. If the servers are geographically distributed, you can provide the private LAN by inserting a second NIC card in each server (sides A and B) and connecting them with a dedicated T1 line that meets the specific network requirements for this connection as documented in the Cisco ICM Software Installation Guide, available at

http://www.cisco.com/univercd/cc/td/doc/product/icm/icmentpr/icm50doc/coreicm5/plngupin/instlgd.pdf

Within the ICM PG, two software processes are run to manage the connectivity to the Cisco CallManager cluster: the JTAPI Gateway and the CallManager PIM. The JTAPI Gateway is started by the PG automatically and runs as a node-managed process, which means that the PG will monitor this process and automatically restart it if it should fail for any reason. The JTAPI Gateway handles the low-level JTAPI socket connection protocol and messaging between the PIM and the Cisco CallManager CTI Manager, and it is specific for the version of Cisco CallManager.

The ICM PG PIM is also a node-managed process and is monitored for unexpected failures and automatically restarted. This process manages the higher-level interface between the ICM and the Cisco CallManager cluster, requesting specific objects to monitor and handling route requests from the Cisco CallManager cluster.

In a duplex ICM PG environment, both JTAPI services from both Cisco CallManager PG sides log into the CTI Manager upon initialization. Cisco CallManager PG side A logs into the primary CTI Manager, while PG side B logs into the secondary CTI Manager. However, only the active side of the Cisco CallManager PG registers monitors for phones and CTI route points. The duplex ICM PG pair works in hot standby mode, with only the active PG side PIM communicating with the Cisco CallManager cluster. The standby side logs into the secondary CTI Manager only to initialize the interface and prime it for a failover. The registration and initialization services of the Cisco CallManager devices take a significant amount of time, and having the CTI Manager primed significantly decreases the time for failover.

In duplex PG operation, the PG side that is able to connect to the ICM Call Router Server and request configuration information first will be the side that goes active. It is not deterministic based upon the Side A or Side B designation of the PG device, but it depends only upon the ability of the PG to connect to the Call Router, and it ensures that only the PG side that has the best connection to the Call Router will attempt to go active.

Cisco CallManager Failure Scenarios

A duplex ICM model contains no single points of failure. However, there are scenarios where a combination of multiple failures can prevent IPCC from accepting new incoming calls. Also, if a component of the IPCC solution does not itself support redundancy and failover, existing calls on that component will be dropped. The following ICM failure scenarios have the most impact on high availability, and Cisco CallManager Peripheral Interface Managers (PIMs) cannot activate if the either of the following failure scenarios occurs (see Figure 3-16):

PIM side A and the secondary CTI Manager that services the PIM on side B both fail.

PIM side B and the primary CTI Manager that services the PIM on side A both fail.

In either of these cases, the ICM will have no visibility to the Cisco CallManager cluster.

Figure 3-16 Cisco CallManager PGs Cannot Cross-Connect to Backup CTI Managers

ICM Failover Scenarios

This section describes how redundancy works in the following failure scenarios:

Scenario 1 - Cisco CallManager and CTI Manager Fail

Scenario 2 - Cisco CallManager PG Side A Fails

Scenario 3 - Only Cisco CallManager Fails

Scenario 4 - Only CTI Manager Fails

Scenario 1 - Cisco CallManager and CTI Manager Fail

Figure 3-17 shows a complete system failure or loss of network connectivity on Cisco CallManager A. The CTI Manager and Cisco CallManager services are both active on the same server, and Cisco CallManager A is the primary CTI Manager in this case. The following conditions apply to this scenario:

All phones and gateways are registered with Cisco CallManager A.

All phones and gateways are configured to re-home to Cisco CallManager B (that is, B is the backup).

Cisco CallManagers A and B are each running a separate instance of CTI Manager.

When all of the software services on CallManager Subscriber A fail (call processing, CTI Manager, and so on), all phones and gateways re-home to Cisco CallManager B.

PG side A detects a failure and induces a failover to PG side B.

PG side B becomes active and registers all dialed numbers and phones; call processing continues.

After an agent disconnects from all calls, the IP phone re-homes to the backup Cisco CallManager B. The agent will have to log in again manually using the agent desktop.

When Cisco CallManager A recovers, all phones and gateways re-home to it.

PG side B remains active, using the CTI Manager on Cisco CallManager B.

During this failure, any calls in progress at an IPCC agent will remain active. When the call is completed, the phone will re-home to the backup Cisco CallManager automatically.

After the failure is recovered, the PG will not fail back to the A side of the duplex pair. All CTI messaging will be handled using the CTI Manager on Cisco CallManager B, which will communicate to Cisco CallManager A to obtain phone state and call information.

Figure 3-17 Scenario 1 - Cisco CallManager and CTI Manager Fail

Scenario 2 - Cisco CallManager PG Side A Fails

Figure 3-18 shows a failure on PG side A and a failover to PG side B. All CTI Manager and Cisco CallManager services continue running normally. The following conditions apply to this scenario:

All phones and gateways are registered with Cisco CallManager A.

All phones and gateways are configured to re-home to Cisco CallManager B (that is, B is the backup).

Cisco CallManagers A and B are each running a separate instance of CTI Manager.

When PG side A fails, PG side B becomes active.

PG side B registers all dialed numbers and phones; call processing continues.

After an agent disconnects from all calls, that agent's desktop functionality is restored to the same state prior to failover.

When PG side A recovers, PG side B remains active and uses the CTI Manager on Cisco CallManager B.

Figure 3-18 Scenario 2 - Cisco CallManager PG Side A Fails

Scenario 3 - Only Cisco CallManager Fails

Figure 3-19 shows a failure on Cisco CallManager A. The CTI Manager services are running on Cisco CallManagers C and D, and Cisco CallManager C is acting as the primary CTI Manager. However, all phones and gateways are registered with Cisco CallManager A. During this failure, Cisco CallManager is not affected because the PG communicates with the CTI Manager service, not the Cisco CallManager service. All phones re-home individually to the standby Cisco CallManager B if they are not in a call. If a phone is in a call, it re-homes to Cisco CallManager B after it disconnects from the call.

The following conditions apply to this scenario:

All phones and gateways are registered with Cisco CallManager A.

All phones and gateways are configured to re-home to Cisco CallManager B (that is, B is the backup).

Cisco CallManagers C and D are each running a separate instance of CTI Manager.

When Cisco CallManager A fails, phones and gateways re-home to Cisco CallManager B.

PG side A remains connected and active, with a CTI Manager connection on Cisco CallManager subscriber C. It does not fail-over because the JTAPI/CTI Manager connection has not failed. However, it will see the phones and devices being unregistered from Cisco CallManager subscriber A (where they were registered) and will then be notified of these devices being re-registered on Cisco CallManager subscriber B automatically. During the time that the agent phones are not registered, the PG will disable the agent desktops to prevent the agents from attempting to use the system while their phones are not actively registered with a Cisco CallManager subscriber.

Call processing continues for any devices not registered to Cisco CallManager subscriber A. Call processing also continues for those devices on subscriber A when they are re-registered with their backup subscriber.

Agents on an active call will stay in their connected state until they complete the call; however, the agent desktop will be disabled to prevent any conference, transfer, or other telephony events during the failover. After the agent disconnects the active call, that agent's phone will re-register with the backup subscriber, and the agent will have to log in again manually using the agent desktop.

When Cisco CallManager A recovers, phones and gateways re-home to it. This re-homing can be set up on Cisco CallManager to gracefully return groups of phones and devices over time or to require manual intervention during a maintenance window to minimize the impact to the call center.

Call processing continues normally after the phones and devices have returned to their original subscriber.

Figure 3-19 Scenario 3 - Only Cisco CallManager Fails

Scenario 4 - Only CTI Manager Fails

Figure 3-20 shows a CTI Manager service failure on Cisco CallManager C. The CTI Manager services are running on Cisco CallManagers C and D, and Cisco CallManager C is the primary CTI Manager. However, all phones and gateways are registered with Cisco CallManager A. During this failure, both the CTI Manager and the PG fail-over to their secondary sides. Because the JTAPI service on PG side B is already logged into the secondary (now primary) CTI Manager, the device registration and initialization time is significantly shorter than if the JTAPI service on PG side B had to log into the CTI Manager.

The following conditions apply to this scenario:

All phones and gateways are registered with Cisco CallManager A.

All phones and gateways are configured to re-home to Cisco CallManager B (that is, B is the backup).

Cisco CallManagers C and D are each running a separate instance of CTI Manager.

When Cisco CallManager C fails, PG side A detects a failure of the CTI Manager on that server and induces a failover to PG side B.

PG side B registers all dialed numbers and phones with Cisco CallManager D, and call processing continues.

After an agent disconnects from all calls, that agent's desktop functionality is restored to the same state prior to failover.

When Cisco CallManager C recovers, PG side B continues to be active and uses the CTI Manager on Cisco CallManager D.

Figure 3-20 Only CTI Manager Fails

IPCC Scenarios for Clustering over the WAN

IPCC Enterprise can also be overlaid with the Cisco CallManager design model for clustering over the WAN, which allows for high availability of Cisco CallManager resources across multiple locations and data center locations. There are a number of specific design requirements for Cisco CallManager to support this deployment model, and IPCC adds its own specific requirements and new failover considerations to the model.

Specific testing has been performed to identify the design requirements and failover scenarios, but no code changes were made to the core IPCC solution components to support this model. The success of this design model relies on specific network configuration and setup, and the network must be monitored and maintained. The component failure scenarios noted previously (see ICM Failover Scenarios) are still valid in this model, and the additional failure scenarios for this model include:

Scenario 1 - ICM Central Controller or Peripheral Gateway Private Network Fails

Scenario 2 - Visible Network Fails

Scenario 3 - Visible and Private Networks Both Fail (Dual Failure)

Scenario 4 - Remote Agent Location WAN Fails

Scenario 1 - ICM Central Controller or Peripheral Gateway Private Network Fails

In clustering over the WAN with IPCC, there must be a dedicated, isolated private network connection between the geographically distributed Central Controller (Call Router/Logger) and the split Peripheral Gateway pair to maintain state and synchronization between the sides of the system, and UDP heartbeats are generated to verify the health of this link. The ICM uses the heartbeats to detect a failure on the private link. Missing five consecutive heartbeats will signal the ICM that the link or the remote partner system might have failed.

If the private network fails between the ICM Central Controllers, the following conditions apply:

The Call Routers will detect the failure by missing five consecutive UDP heartbeats. Both Call Routers send a "test other side" (TOS) message to the Peripheral Gateways, starting with PG1A, then PG1B, then PG2A, and so forth. The TOS message requests the Peripheral Gateway to check if it can "see" the Call Router on the other side to determine if the failure is a network failure or a failure of the redundant pair.

The Call Routers verify which side sees more active connections of the Peripheral Gateways. That side will continue to function as the active Call Router in simplex mode, and the redundant Call Router will be disabled.

All the Peripheral Gateways will realign their active data feed to the active Call Router over the visible network, with no failover or loss of service.

There is no impact to the agents, calls in progress, or calls in queue. The system can continue to function normally; however; the Call Routers will be in simplex mode until the private network link is restored.

If the private network fails between the Cisco CallManager Peripheral Gateways, the following conditions apply:

The Peripheral Gateway sides will detect the failure by missing five consecutive UDP heartbeats. The Peripheral Gateways verify which side of the duplex pair has the active connection to the Cisco CallManager cluster.

The Peripheral Gateway side of the duplex pair that was actively connected to the Cisco CallManager cluster will continue to function as the active side of the pair, in simplex mode. The other side will be inactive until the private network connection is restored.

There is no impact to the agents, calls in progress, or calls in queue. The system can continue to function normally; however; the Call Routers will be in simplex mode until the private network link is restored.

If the two private network connections were combined into one link, the failures would follow the same path; however, the system would be running in simplex on both the Call Router and the Peripheral Gateway. If a second failure were to occur at that point, the system could lose some or all of the call routing and ACD functionality.

Scenario 2 - Visible Network Fails

The visible network in this design model is the network path between the data center locations where the main system components (Cisco CallManager subscribers, Peripheral Gateways, IP-IVR/ISN components, and so forth) are located. This network is used to carry all the voice traffic (RTP stream and call control signalling), ICM CTI (call control signalling) traffic, as well as all typical data network traffic between the sites. In order to meet the requirements of Cisco CallManager clustering over the WAN, this link must be highly available with very low latency and sufficient bandwidth. This link is critical to the IPCC design because it is part of the fault-tolerant design of the system, and it must be highly resilient as well.

If the visible network fails between the data center locations, the following conditions apply:

The Cisco CallManager subscribers will detect the failure and continue to function locally, with no impact to local call processing and call control. However, any calls that were set up over this WAN link will fail with the link.

The ICM Call Routers will detect the failure because the normal flow of TCP keep-alives from the remote Peripheral Gateways will stop. Likewise, the Peripheral Gateways will detect this failure by the loss of TCP keep-alives from the remote Call Routers. The Peripheral Gateways will automatically realign their data communications to the local Call Router, and the local Call Router will then use the private network to pass data to the Call Router on the other side to continue call processing. This does not cause a failover of the Peripheral Gateway or the Call Router.

Agents might be affected by this failure under the following circumstances:

If the agent desktop (Cisco Agent Desktop or CTI OS) is registered to the Peripheral Gateway on side A of the system but the physical phone is registered to side B of the Cisco CallManager cluster

Under normal circumstances, the phone events would be passed from side B to side A over the visible network via the CTI Manager Service to present these events to the side A Peripheral Gateway. The visible network failure will not force the IP Phone to re-home to side A of the cluster, and the phone will remain operational on the isolated side B. The Peripheral Gateway will no longer be able to see this phone, and the agent will be logged out of IPCC automatically because the system can no longer direct calls to the agent's phone.

If the agent desktop (Cisco Agent Desktop or CTI OS) and IP Phone are both registered to side A of the Peripheral Gateway and Cisco CallManager, but the phone is reset and it re-registers to a side B of the Cisco CallManager subscriber

If the IP Phone re-homes or is manually reset and forced to register to side B of a Cisco CallManager subscriber, the Cisco CallManager subscriber on side A that is providing the CTI Manager Service to the local Peripheral Gateway will unregister the phone and remove it from service. Because the visible network is down, the remote Cisco CallManager subscriber at side B cannot send the phone registration event to the remote Peripheral Gateway. IPCC will log out this agent because it can no longer control the phone for the agent.

If the agent desktop (CTI OS or Cisco Agent Desktop) is registered to the CTI OS Server at the side-B site but the active Peripheral Gateway side is at the side-A site

Under normal operation, the CTI OS desktop (and Cisco Agent Desktop Server) will load-balance their connections to the CTI OS Server pair. At any given time, half the agent connections would be on a CTI OS server that has to cross the visible network to connect to the active Peripheral Gateway CTI Server (CG). When the visible network fails, the CTI OS Server detects the loss of connection with the remote Peripheral Gateway CTI Server (CG) and disconnects the active agent desktop clients to force them to re-home to the redundant CTI OS Server at the remote site. The CTI OS agent desktop is aware of the redundant CTI OS server and will automatically use this server. During this transition, the agent desktop will be disabled and will return to operational state as soon as it is connected to the redundant CTI OS server. (The agent may be logged out or put into not-read state, depending upon the /LOAD parameter defined for the Cisco CallManager Peripheral Gateway in ICM Config Manager).

Scenario 3 - Visible and Private Networks Both Fail (Dual Failure)

Individually, the private and visible networks can fail with limited impact to the IPCC agents and calls. However, if both of these networks fail at the same time, the system will be reduced to very limited functionality. This failure should be considered catastrophic and should be avoided by careful WAN design, with backup and resiliency built into the design.

If both the visible and private networks fail at the same time, the following conditions apply:

The Cisco CallManager subscribers will detect the failure and continue to function locally, with no impact to local call processing and call control. However, any calls that were set up over this WAN link will fail with the link.

The Call Routers and Peripheral Gateways will detect the private network failure after missing five consecutive UDP heartbeats. These heartbeats are generated every 100 ms, and the failure will be detected within about 500 ms on this link.

The Call Routers will attempt to contact their Peripheral Gateways with the "test other side" message to determine if the failure was a network issue or if the remote Call Router had failed and was no longer able to send heartbeats. The Call Routers will determine the side with the most active Peripheral Gateway connections, and that side will stay active in simplex mode while the remote Call Router will be in standby mode. The Call Routers will send a message to the Peripheral Gateways to realign their data feeds to the active call router only.

The Peripheral Gateways will determine which side has the active Cisco CallManager connection. However, it will also consider the state of the Call Router, and the Peripheral Gateway will not remain active if it is not able to connect to an active Call Router.

The surviving Call Router and Peripheral Gateways will detect the failure of the visible network by the loss of TCP keep-alives on the visible network. These keep-alives are sent every 400 ms, so it can take up to two seconds before this failure is detected.

The Call Router will be able to see only the local Peripheral Gateways, which are those used to control local IP-IVR or ISN ports and the local half of the CallManager Peripheral Gateway pair. The remote IP-IVR or ISN Peripheral Gateways will be off-line, taking them out of service in the ICM Call Routing Scripts (using the "peripheral on-line" status checks) and forcing any of the calls in progress on these devices to be disconnected. (ISN can redirect the calls upon failure.)

Any new calls that come into the disabled side will not be routed by the IPCC, but they can be redirected or handled using standard Cisco CallManager "redirect on failure" for their CTI Route Points.

Agents will be impacted as noted above if their IP Phones are registered to the side of the Cisco CallManager cluster opposite the location of their active Peripheral Gateway and CTI OS Server connection. Only agents that were active on the surviving side of the Peripheral Gateway with phones registered locally to that site will not be impacted.

At this point, the Call Router and Cisco CallManager Peripheral Gateway will run in simplex mode, and the system will accept new calls from only the surviving side for IPCC call treatment. The IP-IVR/ISN functionality will also be limited to the surviving side as well.

Scenario 4 - Remote Agent Location WAN Fails

The IPCC design model for clustering over the WAN assumes the IPCC agents are remotely located at multiple sites connected by a WAN. Each agent location requires WAN connectivity to both of the data center locations where the Cisco CallManager and ICM components are located. These connections should be isolated and provide for redundancy as well as making use of basic SRST functionality in the event of a complete network failure so that the remote site would still have basic dial tone service to make emergency (911) calls.

If side A of the WAN at the remote agent location fails, the following conditions apply:

Any IP phones that are homed to the side-A Cisco CallManager subscribers will automatically re-home to the side-B subscribers (provide the redundancy group is configured).

Agent desktops that are connected to the CTI OS or Cisco Agent Desktop server at that site will automatically realign to the redundant CTI OS server at the remote site. (Agent desktop will be disabled during the realignment process.)

If both sides of the WAN at the remote agent location fail, the following conditions apply:

The local voice gateway will detect the failure of the communications path to the Cisco CallManager cluster and will go into SRST mode to provide local dial-tone functionality.

The agent desktop will detect the loss of connectivity to the CTI OS Server (or Cisco Agent Desktop Server) and automatically log the agent out of the system. While the IP phones are in SRST mode, they will not be able to function as IPCC agents.

Understanding Failure Recovery

This section analyzes the failover recovery of each individual part (products and subcomponents inside each product) of the IPCC solution.

Cisco CallManager Service

In larger deployments, it is possible that the Cisco CallManager where agent phones are registered will not be running the CTI Manager service that communicates with the Cisco CallManager. When an active Cisco CallManager service fails, all the devices registered to it are reported out of service by the CTI Manager service. Cisco CallManager reporting shows the call as terminated when the Cisco CallManager failure occurred because, from a Cisco CallManager reporting perspective, any calls in progress are terminated and the agents are logged out so that future calls are not routed to them. IP phones of agents not on calls at the time of failure will quickly register with the backup Cisco CallManager. The IP phone of an agent on a call at the time of failure will not register with the backup Cisco CallManager until after the agent completes the current call. If MGCP gateways are used, then the calls in progress survive, but further call control functions (hold, retrieve, transfer, conference, and so on) are not possible.

When the active Cisco CallManager fails, the agent desktops show the agents as being logged out, their IP phones display a message stating that the phone has gone off-line, and all the IP phone soft keys are grayed out until the phones fail-over to the backup Cisco CallManager. To continue receiving calls, the agents must wait for their phones to re-register with a backup Cisco CallManager to have their desktop functionality restored by the CTI server to the state prior to the Cisco CallManager service failure. Upon recovery of the primary Cisco CallManager, the agent phones re-register with their original service because all the Cisco CallManager devices are forced to register with their home Cisco CallManager.

In summary, the Cisco CallManager service is separate from the CTI Manager service, which talks to the Cisco CallManager PG via JTAPI. The Cisco CallManager service is responsible for registering the IP phones, and its failure does not affect the Cisco CallManager PGs. From a Cisco CallManager perspective, the PG does not go off-line because the Cisco CallManager server running CTI Manager remains operational. Therefore, the PG does not need to fail-over.

IP IVR (CRS)

When a CTI Manager fails, the IP IVR (CRS) JTAPI subsystem shuts down and restarts by trying to connect to the secondary CTI Manager, if a secondary is specified. In addition, all voice calls at this IP IVR are dropped. If there is an available secondary CTI Manager, it logs into this CTI Manager again and re-registers all the CTI ports associated with the IP IVR JTAPI user. After all the Cisco CallManager devices are successfully registered with the IP IVR JTAPI user, the server resumes its Voice Response Unit (VRU) functions and handles new calls. This does not impact the Internet Service Node (ISN) because it does not depend upon the Cisco CallManager JTAPI service.

ICM

The ICM is a collection of services and processes within these services. The failover and recovery process for each of these services is unique and requires carefully examination to understand the impact to other parts of the IPCC solution, including another ICM service.

As stated previously, all redundant ICM services discussed in this chapter must be located at the same site and connected through a private LAN. You can provide the private LAN by installing a second network interface card (NIC) in each server (sides A and B) and connecting them with a crossover cable. By doing this, you can eliminate all external network equipment failures.

Cisco CallManager PG and CTI Manager Service

When the active CTI Manager or PG fails, the JTAPI detects an OUT_OF_SERVICE event and induces a failover to the standby PG. Since the standby PG is logged into the standby CTI Manager already, it registers monitors for the phones with logged-in agents and configured dialed numbers and CTI route points. This initialization service takes place at a rate of about 5 devices per second. The agent desktops show them as being logged out, and a message displays stating that their routing client or peripheral (Cisco CallManager) has gone off-line. (This warning can be turned on or off, depending on the administrator's preference.) All agents lose their desktop functionality until the failure recovery is complete. The agents can recognize this event because the agent state display on their desktop will show logged out, and the login button will be the only button available. Any existing calls handled by the agent should remain alive without any impact to the caller.


Note Agents should not push any buttons during desktop failover because these keystrokes can be buffered and sent to the CTI server when it completes its failover and restores the agent states.


Once the CTI Manager or PG completes its failover, the agents can return to their previous call state (talking, ready, not ready, and so forth). At this point, the agents should also be able to release, transfer, or conference calls if they were on a call at the time of the failure. All the call data that had been collected and stored via a call data update message is retained on the agent desktops, recovered, and matched with call context information saved on the PG. However, all agents without active calls are reset to the default Not Ready state. In addition, the Longest Available Agent (LAA) algorithm resets the timers for all the agents to zero.

ICM Voice Response Unit PG

When a Voice Response Unit (VRU) PG fails, all the calls currently in queue on that IP IVR (CRS) are dropped. Calls queued in the Internet Service Node (ISN) are not dropped and will be redirected to a secondary ISN or number in the dial plan, if available. However, the Service Control Interface (SCI) link of the failed VRU PG automatically connects to the backup VRU PG so that all new calls can be handled properly. Upon recovery of the failed VRU PG, the currently running VRU PG continues to operate as the active VRU PG. Therefore, having redundant VRU PGs adds significant value because it allows an IP IVR to continue to function as an active IP IVR. Without VRU PG redundancy, a VRU PG failure would block use of that IP IVR even though the IP IVR is working properly. (See Figure 3-21.)

Figure 3-21 Redundant ICM VRU PGs with Two IP IVR Servers

ICM Call Router and Logger

The ICM Central Controllers or ICM Servers are shown in these diagrams as a single set of redundant servers. However, depending upon the size of the implementation, they could be deployed with multiple servers to host the following key software processes:

ICM Call Router

The ICM Call Router is the "brain" of the system that maintains a constant memory image of the state of all the agents, calls, and events in the system. It performs the call routing in the system, executing the user-created ICM Routing Scripts and populating the real-time reporting feeds for the Administrative Workstation. The Call Router software runs in synchronized execution, with both of the redundant servers running the same memory image of the current state across the system. They keep this information updated by passing the state events between the servers on the private LAN connection.

ICM Logger and Database Server

The ICM Logger and Database Server maintains the system database for the configuration (agent IDs, skill groups, call types, and so forth) and scripting (call flow scripts) as well as the historical data from call processing. The Loggers receive data from their local Call Router process to store in the system database. Because the Call Routers are synchronized, the Logger data is also synchronized. In the event that the two Logger databases are out of synchronization, they can be resynchronized manually by using the ICMDBA application over the private LAN. The Logger also provides a replication of its historical data to the customer Historical Database Server (HDS) Admin Workstations over the visible network.

In the event that the one of the ICM Call Routers should fail, the surviving server will detect the failure after missing five consecutive heartbeats on the private LAN. The Call Routers generate these heart beats every 100 ms, so it will take up to 500 ms to detect this failure. Upon detection of the failure, the surviving Call Router will contact the Peripheral Gateways in the system to verify the type of failure that occurred. The loss of heartbeats on the private network could be caused by either of the following conditions:

Private network outage — It is possible for the private LAN switch or WAN to be down but for both of the ICM Call Routers to still be fully operational. In this case, the Peripheral Gateways will still see both of the ICM Call Routers even though they cannot see each other over the private network. In this case, the Call Routers will both send a Test Other Side message to the PGs to determine if the Call Router on the other side is still operational and which side should be active. Based upon the messages from the PGs, the Call Router that previously had the most active PG connections would remain active in simplex mode, and the Call Router on the other side would go idle until the private network is restored.

Call Router hardware failure — It is possible for the Call Router on the other side to have a physical hardware failure and be completely out of service. In this case, only the surviving Call Router would be communicating with the Peripheral Gateways using the Test Other Side message. The Peripheral Gateways would report that they can no longer see the Call Router on the other side, and the surviving Call Router would take over the active processing role in simplex mode.

During the Call Router failover processing, any Route Requests sent to the Call Router from a Carrier Network Interface Controller (NIC) or Peripheral Gateway will be queued until the surviving Call Router is in active simplex mode. Any calls in progress in the IVR or at an agent will not be impacted.

If one of the ICM Logger and Database Servers were to fail, there would be no immediate impact except that the local Call Router would no longer be able to store data from call processing. The redundant Logger would continue to accept data from its local Call Router. When the Logger server is restored, the Logger will contact the redundant Logger to determine how long it had been off-line. If the Logger was off-line for less than 12 hours, it will automatically request all the transactions it missed from the redundant Logger while it was off-line. The Loggers maintain a recovery key that tracks the date and time of each entry recorded in the database, and these keys will be used to restore data to the failed Logger over the private network.

If the Logger was off-line for more than 12 hours, the system will not automatically resynchronize the databases. In this case, resynchronization has to be done manually using the ICMDBA application. Manual resynchronization allows the system administrator to decide when to perform this data transfer on the private network, perhaps scheduling it during a maintenance window when there would be little call processing activity in the system.

The Logger replication process that sends data from the Logger database to the HDS Admin Workstations will automatically replicate each new row written to the Logger database when the synchronization takes place as well.

There is no impact to call processing during a Logger failure; however, the HDS data that is replicated from that Logger would stop until the Logger can be restored.

Additionally, if the Outbound Option is used, the Campaign Manager software is loaded on only one of the Logger platforms (must be Logger A). If that platform is out of service, any outbound calling will stop until the Logger can be restored to operational status.

Admin Workstation Real-Time Distributor (RTD)

The Administrative Workstation (AW) Real-Time Distributor (RTD) provides the user interface to the system for making configuration and scripting changes. It also hosts the web-based reporting tool, WebView and Internet Script Editor.

These servers do not support redundant or duplex operation, as the other ICM system components do. However, you can deploy multiple Administrative Workstation servers to provide redundancy for the IPCC. (See Figure 3-22.)

Figure 3-22 Redundant ICM Distributors and AW Servers

Administrative Workstation Real-Time Distributors are clients of the ICM Call Router real-time feed that provides real-time information about the entire IPCC across the enterprise. Real-Time Distributors at the same site can be set up as part of an Admin Site that includes a designated primary real-time distributor and one or more secondary real-time distributors. Another option is to add Client Admin Workstations which do not have their own local SQL databases and are homed to a Real-Time Distributor for their SQL database and real-time feed.

The Admin Site reduces the number of real-time feed clients the ICM Call Router has to service at a particular site. For remote sites, this is important because it can reduce the required bandwidth to support remote Admin Workstations across a WAN connection.

When using an Admin Site, the primary real-time distributor is the one that will register with the ICM Call Router for the real-time feed, and the other real-time distributors within that Admin Site register with the primary real-time distributor for the real-time feed. If the primary real-time distributor is down or does not accept the registration from the secondary real-time distributors, they will register with the ICM Call Router for the real-time feed. Client AWs that cannot register with the primary or secondary real-time distributors will not be able to perform any Admin Workstation tasks until the distributors are restored.

Alternatively, each real-time distributor could be deployed in its own Admin Site regardless of the physical site of the device. This will create more overhead for the ICM Call Router to maintain multiple real-time feed clients; however, it will prevent a failure of the primary real-time distributor from taking down the secondary distributors at the site.

Additionally, if the Admin Workstation is being used to host the ConAPI interface for the Multi-Channel Options (Cisco Email Manager and Cisco Content Server), any configuration changes made to the ICM, Cisco Email Manager, or Cisco Content Server systems will not be passed over the ConAPI interface until it is restored.

CTI Server

The CTI Server monitors the PIM data traffic for specific CTI messages (such as "call ringing" or "off hook" events) and makes them available to CTI clients such as the CTI OS Server or Cisco Agent Desktop Enterprise Server. It also processes third-party call control messages (such as "make call" or "answer call") from the CTI clients and sends these messages via the PIM interface of the PG to Cisco CallManager to process the event on behalf of the agent desktop.

CTI Server is redundant on duplex CTI Servers or can be co-resident on the PG servers. (See Figure 3-23.) It does not, however, maintain agent state in the event of a failure. Upon failure of the CTI Server, the redundant CTI server becomes active and begins processing call events. CTI OS Server is a client of the CTI Server and is designed to monitor both CTI Servers in a duplex environment and maintain the agent state during failover processing. CTI OS agents will see their desktop buttons gray-out during the failover to prevent them from attempting to perform tasks while the CTI Server is down. The buttons will be restored as soon as the redundant CTI Server is restored, and the agent does not have to log on again to the desktop application.

The CTI Server is also critical to the operation of the Multi-Channel Options (Cisco Email Manager and Cisco Content Server) as well as the Outbound Option. If the CTI Server is down on both sides of the duplex agent Peripheral Gateway pair, none of the agents for that Agent Peripheral Gateway will be able to log into these applications.

Figure 3-23 Redundant CTI Servers with No Cisco Agent Desktop Server Installed

CTI OS Considerations

CTI OS acts as client to CTI Server and provides agent and supervisor desktop functionality for IPCC. It manages agent state and functionality during a failover of CTI Server, and it can be deployed as redundant CTI OS Servers. The CTI OS Agent Desktop load-balances the agents between the redundant servers automatically, and agents sitting next to each other may in fact be registered to two different CTI OS Servers.

The CTI Object Server (CTI OS) consists of two services, the CTI OS service and the CTI driver. If either of these two fails, then the active CTI OS fails-over to its peer server. Therefore, it is important to keep both of these services active at all times.

Cisco Agent Desktop Considerations

Cisco Agent Desktop is a client of CTI OS, which provides for automatic failover and redundancy for the Cisco Agent Desktop Server. If the Cisco CallManager Peripheral Gateway or CTI Server (CG) fail-over, CTI OS maintains the agent state and information during the failover to prevent agents from being logged out by the system because of the failover.

The Cisco Agent Desktop Servers (Enterprise Server, Chat, RASCAL, and so forth) can also be deployed redundantly to allow for failover of the core Cisco Agent Desktop components. Cisco Agent Desktop software is aware of the redundant Cisco Agent Desktop Servers and will automatically fail-over in the event of a Cisco Agent Desktop Server process or hardware failure.

Other Considerations

An IPCC failover can affect other parts of the solution. Although IPCC may stay up and running, some data could be lost during its failover, or other products that depend on IPCC to function properly might not be able to handle an IPCC failover. This section examines what happens to other critical areas in the IPCC solution during and after failover.

Reporting

The IPCC reporting feature uses real-time, five-minute and half-hour intervals to build its reporting database. Therefore, at the end of each five-minute and half-hour interval, each Peripheral Gateway will gather the data it has kept locally and send it to the Call Routers. The Call Routers process the data and send it to their local Logger and Database Servers for historical data storage. If the deployment has the Historical Data Server (HDS) option, that data is then replicated to the HDS server from the Logger as it is written to the Logger database.

The Peripheral Gateways provide buffering (in memory and on disk) of the five-minute and half-hour data collected by the system to handle network connectivity failures or slow network response as well as automatic retransmission of data when the network service is restored. However, physical failure of both Peripheral Gateways in a redundant pair can result in loss of the half-hour or five-minute data that has not been transmitted to the Central Controller. Cisco recommends the use of redundant Peripheral Gateways to reduce the chance of losing both physical hardware devices and their associated data during an outage window.

When agents log out, all their reporting statistics stop. The next time the agents log in, their real-time statistics start from zero. Typically, ICM failover does not force the agents to log out; however, it does reset their agent statistics when the ICM failover is complete, but their agent desktop functionality is restored back to its pre-failover state.

For further information, refer to the Cisco IP Contact Center Reporting Guide, available at

http://www.cisco.com/univercd/cc/td/doc/product/icm/icmentpr/icm50doc/icm5rept/index.htm