Guest

Cisco 7800 Series Media Convergence Servers

Field Notice: FN - 63270 - MCS-7835-I2, MCS-7845-I2, RAID Driver can cause Service Outage, SW upgrade required

Field Notice: FN - 63270 - MCS-7835-I2, MCS-7845-I2, RAID Driver can cause Service Outage, SW upgrade required

November 4, 2009


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.0
04-NOV-2009
Initial Public Release

Products Affected

Products Affected Comments
MCS-7835-I2-xxx
7835-I2 Bare Metal, Appliance Servers, and IBM x3650 Customer-provided servers.
MCS-7845-I2-xxx
7845-I2 Bare Metal, Appliance Servers, and IBM x3650 Customer-provided servers.

Problem Description

To be exposed to this issue, a server in the Products Affected section above must be running one of the affected application versions listed in the table found in the Workaround/Solution section.

Certain IBM RAID driver versions can cause instability in the RAID environment and lead to hard drives being marked as Read-Only. The affected RAID drivers are contained in certain Application versions listed in the table in the Workaround/Solution section. Since Write access to the RAID array is required with Unified Communications Applications, this problem can prevent critical files from being written to the array and eventually can cause a service outage.

For a list of affected and fixed Application versions, please see the table in the Workaround/Solution section.

Either the Root or Common partition can become Read-Only.

Background

In this Field Notice, the term "Appliance Server" refers to a turnkey software appliance, which is a server purchased from Cisco that has software pre-installed before shipping.

Problem Symptoms

Affected servers using one of the affected versions may suddenly experience a loss of service.

There are two ways to determine if a system is affected:

  1. CLI/Console Commands
  2. Examine log files

To determine if the Root partition is affected, do the following from a CLI or console session to the server:

  1. Type the command "utils iothrottle enable" without quotes and hit enter.
  2. The server should return a message which reads "I/O throttling has been enabled".
  3. Type the command "utils iothrottle disable" without quotes and hit enter.
  4. The server should return a message which reads "I/O throttling has been disabled".

If the server does not return those messages, it means the Root partition is in a Read-Only mode.

To determine if the Common partition is affected, do the following from a CLI or console session to the server:

  1. Enter the command "file list activelog syslog/* detail" without quotes and hit enter.
  2. The server will return the size of the file "CiscoSyslog" change, e.g.: 589,219 CiscoSyslog
  3. Two minutes later, repeat step 1.
  4. Compare the sizes returned. If the size is unchanged, it means the Common partition is in a Read-Only mode.

Any of the following Log messages may be visible on the server:

From the RTMT-System Logs or "messages" file on the local server:

kern 2 kernel: EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal

kern 2 kernel: Remounting filesystem read-only

From the RTMT-Application Logs or "CiscoSyslog" file:

SyslogSeverityMatchFound events generated: SeverityMatch - Critical kernel: EXT3-fs error (device sdb1) in start_transaction: Journal has aborted

-or-

SyslogSeverityMatchFound Detail:SyslogSeverityMatchFound events generated: SeverityMatch - Critical kernel: EXT3-fs error (device sdb1) in start_transaction: Journal has aborted

From the Console:

EXT3-fs error (device sda6) in start_transaction: Journal has aborted

Workaround/Solution

Do not request any Hardware Replacements (RMA) to resolve this issue. This issue is recoverable via SW upgrades in the field.

Workaround & Recovery

A temporary workaround is to use the Unified Communications Manager Recovery CD to restore write access to the file system. The Unified Communications Manager Recovery CD can be used on any server and application experiencing this issue. This process can be used to recover both Root and Common partition file systems that are affected. The latest Unified Communications Manager Recovery CD is available here:

Unified Communications Manager 7.1(3a) Recovery CD (registered customers only)

Use the following steps to recover write access to the file system:

  1. Boot the system using the recovery disk.
  2. From the recovery CD menu select option 'f' to run file system check.
  3. When completed select option 'q' to quit recovery CD.
  4. Eject the CD when prompted and reboot the system.

Solution

The permanent solution is to migrate to a fixed version either by upgrading or performing a fresh install. The exact action required depends on which partition is affected.

Use the commands shown in the Problem Symptom section to determine which partition is affected. Do not assume that the server filesystem is healthy if service has not yet been affected.

If neither the Root nor Common partition is affected, then:

  1. Perform a backup of the existing version, following the instructions for that application.
  2. Upgrade to a fixed version following the normal upgrade procedures for that application.
  3. Perform a backup of the upgraded version.

If only the Root partition is affected, then:

  1. Use the Recovery CD to restore the file system, following the instructions above from the Workaround section.
  2. Perform a backup of the existing version, following the instructions for that application.
  3. Upgrade to a fixed version following the normal upgrade procedures for that application.
  4. Perform a backup of the upgraded version.

If the Common partition is affected, and a valid backup exists, then:

  1. Perform a Fresh install of the current/affected version.
  2. Restore the system from the backup data.
  3. Upgrade to a fixed version.
  4. Perform a backup of the upgraded version.

Please complete steps 2 and 3 as soon as possible after performing the Fresh Install since the system is theoretically exposed to the issue during before steps 2 and 3 are completed.

If the Common partition is affected, and a valid backup does not exist, then perform a Fresh Install of a fixed version.

See the following table showing affected and fixed versions:

Product Bug ID & Link to Bug Toolkit (registered customers only) Affected Version Fixed Version Availability of Solution
Cisco Unified Communications Manager
7.x
7.0(2a)SU2 and later;
7.1(2b) and later
All available on Software Download site
Cisco Unity Connection
7.x
7.0(2a)SU2 and later;
7.1(2b) and later
All available on Software Download site
Cisco Unified Presence
7.0(x)
7.0(5) and later
Available on Software Download site
Cisco Emergency Responder
7.0(3a)
7.1(1) and later
Available on Software Download site
Cisco Unified Mobility Advantage
7.1(x)
7.1(3) and later
Available on Software Download site
Cisco Unified Mobility (also known as MobilityManager)
Not Affected
Not Affected
Not Affected
Not Affected
Cisco MeetingPlace Express
Not Affected
Not Affected
Not Affected
Not Affected

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.