Guest

Cisco Content Delivery Engine Series

Field Notice: FN - 63785 - VDS-TV: CDE220, CDE420, CDE250, CDE460, and CDE470 Servers Might Encounter a Soft Lockup - Workaround and Fix Available

Field Notice: FN - 63785 - VDS-TV: CDE220, CDE420, CDE250, CDE460, and CDE470 Servers Might Encounter a Soft Lockup - Workaround and Fix Available

April 7, 2014


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.0
07-APR-2014
Initial Public Release

Products Affected

Products Affected Comments
CDS-TV - 3.0.1 
Release 3.2.6 has the fix
CDS-TV - 3.0.2 
Release 3.2.6 has the fix
CDS-TV - 3.0.3 
Release 3.2.6 has the fix
CDS-TV - 3.2.1  
Release 3.2.6 has the fix
CDS-TV - 3.2.2 
Release 3.2.6 has the fix
CDS-TV - 3.2.3 
Release 3.2.6 has the fix
CDS-TV - 3.2.5 
Release 3.2.6 has the fix
CDS-TV - 3.3.1 
Release 3.4.2 has the fix
CDS-TV - 3.4.1 
Release 3.4.2 has the fix

Problem Description

CDE servers CDE220, CDE420, CDE250, CDE460, and CDE470 that run Cisco Videoscape Distribution Suite for Television (VDS-TV) and VDS-Recorder Release 3.x.x might encounter a CPU soft lockup (hang or kernel crash). This results in blocked access via Secure Shell (SSH), Simple Network Management Protocol (SNMP), or a connected keyboard. Prior to this hang, the servers appear to function normally.

This issue is observed on CDE systems with VDS-TV/VDS-Recorder applications:

  • For CDE250, CDE220, and CDE420 servers that run over 208 days since the last reboot.
  • For CDE460 and CDE470 servers that run over 208 days since the last power cycle.

Note: This issue does not occur on CDE110 servers since they have Linux Version 2.6.18.

Background

CDE servers with the VDS-TV application that run Release 3.x.x might crash or hang due to a CPU lockup.

This issue is manifested due to a Linux kernel defect. The affected Linux kernel version is 2.6.32.

Note: This issue does not exist in CDE servers with VDS-TV releases earlier than Release 3.x.x (that run Linux Version 2.6.18).

The soft lockup issue is VDS-TV specific and does not apply to other applications such as Video Distribution Suite for Internet Streaming (VDS-IS) or Visual Quality Experience (VQE).

The command for VDS-TV software release and Linux kernel version is shown here:

Problem Symptoms

On CDE250s, retrieved logs indicate an error as listed here:

As a result of this issue, the CDE system might hit a divide-by-zero kernel crash and misidentification of CPU hog events. When this issue occurs, the system might enter into a hang state with no SSH or SNMP access.

Workaround/Solution

There are two alternatives to address this issue:

  • Upgrade to VDS-TV Release 3.2.6 / 3.4.2 or later which has the permanent kernel fix

    Cisco highly recommends that you upgrade to one of these releases:

  • Power cycle or reboot the server in a maintenance window

    • CDE220, CDE420 and CDE250

      Proactive reboot of the systems that have an uptime of around 200 days. This action resets the CPU's timestamp counter (TSC).

      An uptime command example is shown here:

      [root@Utah722 ~]# uptime
      09:15:05 up 45 days, 18:12, 1 user, load average: 1.00, 1.00, 1.00
    • CDE460 and CDE470

      For these two platforms, there is a tool developed which has to be run once a day during non-prime time. This tool informs you how many days there are until the issue occurs. When this tool indicates eight days, it is then suggested to power cycle the box. This action resets the CPU's TSC.

      Note: This tool can also be used on CDE250, CDE220, and CDE420.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered customer and you must be logged in.

DDTS Description
CSCui91625 (registered customers only)
Kernel2.6.32: System freezes OR Crashes after Uptime reaches 208+ days

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.