Guest

Cisco Unified Contact Center Enterprise

Field Notice: FN - 62508 - Unified Contact Center Enterprise/Hosted, Unified ICM Enterprise/Hosted TCP/IP Private Path Network Failure - Windows Server 2003 SP1


August 21, 2006

NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.


Products Affected

Products Affected

ICM Enterpise - 7.0 and higher

ICM Hosted - 7.0 and higher

IPCC Enterprise - 7.0 and higher

IPCC Hosted - 7.0 and higher

Problem Description

Subject to specific network error rate, topology, and configuration detail, Contact Center customers deploying the above named products may experience periodic failure of the private network path (dedicated path) between the A and B duplexed servers of the central controller (CallRouter) and/or Peripheral Gateway (PG). The application will react as designed by reverting to a simplex operational mode, after which recovery procedures will automatically commence to restore duplex operation. Dependent on network characteristics, the failure / recovery cycle occurrence can vary widely, from a rare episode to as frequent as every several minutes. System transitions at this frequency can become prohibitive to efficient system operation and impose degradation across call control processing.

The problem is specific to version 7.0 and later of the Enterprise Contact Center software running on Microsoft Windows Server 2003 SP1, both Standard and Enterprise editions. It may occur with or without the elective configuration of QoS packet marking on the interface. This issue does not impact Enterprise Contact Server software prior to version 7.0 running on Microsoft's Windows 2000, nor does it apply to the Cisco Unified Contact Center Express or to additional add-on feature options.

Background

Cisco has discovered that the combination of the specific TCP traffic characteristics used by the Message Delivery System (MDS) component of Contact Center Enterprise and Hosted products can, under specific network TCP packet loss and retransmission scenarios, expose a logic condition in Microsoft's implementation of the Windows Server 2003 TCP/IP stack that in turn can cause round-trip time (RTT) and resultant application transmit delay to grow substantially. At the TCP layer, the issue is typically manifested with a halt in the Selective ACK window progress (right edge of the Selective ACK stops being updated), which effectively retards packet throughput. The phenomenon can cause application transmit delays of many seconds on a low-latency, high bandwidth WAN which, once crossing the defensive MDS application round-trip timer threshold, will trigger an intentional application connection reset and MDS transition to simplex state.

It is important to note that similar WAN private network failure scenarios may be caused by a number of product deployment errors unrelated to this specific issue. Examples include lack of sufficient network bandwidth, insufficient traffic classification / prioritization, or incorrect NIC adapter configuration. Given the complexity of definitively diagnosing the specific issue addressed in this article, Cisco recommends the corresponding Microsoft hotfix be applied unconditionally if evidence of private network failure is seen but is not otherwise explained by the aforementioned deployment errors. Customers continuing to experience network issues with the Contact Center products should of course continue to work with their Cisco Certified partner or the Cisco Technical Assistance Center.

Problem Symptoms

Problem symptoms are limited to private network failures specific to Windows Server 2003 deployments, and are indicative of the described underlying problem only in the absence of any configuration or deployment error. The corresponding MDS logfile will show unexpected network failure indicating Connectivity with a duplexed partner has been lost due to a failure of the private network. The occurrence will be accompanied by lower-level (EMT) communication error of 258 (wait operation timed out) and 10053 (established connection was aborted) in each respective end station.

Workaround/Solution

Microsoft has acknowledged the shortcoming in their TCP implementation and has responded with a Hotfix and associated Knowledge Base article. We strongly encourage all applicable 7.0 and later Cisco Contact Center customers to apply the hotfix at their earliest convenience, as per the guidance provided by Microsoft.

Reference Microsoft Knowledge Base (KB) article 922972, or use the Microsoft site KB search tools. Instructions for obtaining the actual HF are contained within the KB article.

Note the KB case describes a more innocuous scenario where small data packets may cause poor performance. In the context of the applicable products, the MDS communications control component deems this degraded performance a threat to call processing reliability and, as described above, intentionally forces the system into a simplex operational mode.

While Cisco strongly recommends application of the Hotfix in cases where evidence of the failure is seen and not otherwise explained, it is possible that such customers may see relief by instead disabling the TCP Selective ACK option on both Windows Server 2003 SP1 machines comprising the private path end-stations. Selective ACK is enabled by default with Windows Server 2003 SP1. Per Microsoft, this is not an actual mitigation to the root cause issue, but Cisco has found it effective in working around the underlying problem in at least one large enterprise deployment. The Selective ACK option is disabled by creating (or, if present, modifying) a registry value called SackOpts found in the Tcpip\Parameters key under System\CurrentControlSet\Services in the HKEY_LOCAL_MACHINE hive. Reference the Microsoft Windows Server 2003 documentation for further information.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered user and you must be logged in.

DDTS

Description

CSCse66172 (registered customers only)

MDS AppRTT failure introduced by delay in MS Win2K3 TCP/IP stack

Revision History

Revision

Date

Comment

1.0

21-AUG-2006

Initial Public Release

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.