Guest

Cisco UCS C-Series Rack Servers

Field Notice: FN - 63442 - LSI RAID Controller Chip Potential Premature Failure - Hardware Replacement Required

Field Notice: FN - 63442 - LSI RAID Controller Chip Potential Premature Failure - Hardware Replacement Required

Revised June 25, 2013
September 21, 2011


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
2.0
25-JUN-2013

Proactive replacement program closed. Change to standard RMA process.

1.0
21-SEPT-2011
Initial Public Release

Products Affected

Products Affected Comments
MDE-RAID01-CTRL
Some affected units installed as option to MDE-1100-K9 Media Delivery Engine 1100
DMS-PCIE-RAID
Some affected units installed as option to SNS-SVR-C200WG-K9 or SNS-SVR-C200WG-K9= DMS Show and Share Server
N20-B6625-1
 
N20-B6625-1-UPG
 
N20-B6625-1=
 
N20-B6625-1D
 
R2X0-ML002
Some affected units installed as option under these PIDs: N1K-C1010, R200-1120402W, R200-BUN-1, R200-BUN-2, R200-BUN-3, R200-BUN-4, R210-2121605W, R210-BUN-2, R210-STAND-CNFGW, UCS-SP2-C210V, UCS-SP-C210E
R2X0-ML002=
 
UCS-B200M2-VCS1
 

Problem Description

A quality problem exists in the LSI 1064e RAID controller installed in certain Cisco products, which causes the potential for premature failure.

Affected units were shipped from Cisco Systems between June 2, 2011, and July 8, 2011, and are identifiable by serial number.

Background

Affected RAID controllers should be replaced based on recommendation from LSI. Affected units are either permenantly installed on server blades or are in mezzanine form factor in rack servers.

Certain lots shipped to Cisco are affected with a quality issue. During the manufacturing process, a problem with a plasma cleaner caused a percentage of products to develop intermetallic crack (IMC) at the interface between the Ball Grid Array (BGA) ball and package pad. The IMC is not isolated to a specific pin and can result in an open circuit on any pin.

Failure rate projection is up to 3% in the first 6 months of use. There is a 3% chance of data corruption if a failure occurs due to the described issue.

Problem Symptoms

In most cases, the 1064E controller from this defective batch/lot will not be functional, and there will be premature failure(s).

Workaround/Solution

Replace the affected hardware. Follow the instructions in the How to Identify Hardware Levels section to determine if you have an affected RAID Controller mezzanine card or B200 M2 blade server. Contact Cisco Technical Support if you require hardware replacement due to this issue.

How To Identify Hardware Levels

Affected blade servers were shipped from Cisco Systems between June 9, 2011, and July 8, 2011.

Affected mezzanine cards were shipped from Cisco Systems between June 2, 2011, and June 16, 2011.

The procedures below describe how to identify specific affected units.

Blade Server Identification

  1. Confirm that your blade server(s) is identified by one of these product IDs: N20-B6625-1, N20-B6625-1=, N20-B6625-1D, N20-B6625-1-UPG, or UCS-B200M2-VCS1.
  2. Log the serial number(s) of the potentially affected blade server(s) for validation. Use one of the these methods to retrieve the serial number and product ID:
    • Method 1: Physically inspect the blade server. Product information is displayed on a sticker on the bottom of the sheet metal blade housing, on a pullout tab on the face of the unit, or on a sticker directly on the face of the unit.
    • Method 2: Log in to UCS Manager, and navigate to Equipment > chassis number > blade number. The product ID and serial number are displayed (as shown in this image):
  3. Use the Cisco Blade Server Serial Number Validation Tool to determine the blade server serial number(s) is affected.

LSI 1064e RAID Controller Mezzanine Cards Identification

  1. LSI 1064e RAID controllers sold as spares by Cisco have the product ID and serial number labeled on the box. If you have product ID R2X0-ML002=, check the box and note the serial number for validation with the tool below. This image illustrates product ID and serial number locations on the box label:
  2. LSI 1064e RAID controller mezzanine cards installed in Cisco rack servers do not have an electronically identifiable serial number, so they must be physically inspected in order to determine the serial number. This operation requires removal of the chassis cover. In order to reduce this disruptive activity as much as possible, you can check the chassis serial number to validate whether it was originally shipped with an affected LSI 1064e RAID controller. The mezzanine card must still be physically inspected, but a user can avoid opening a server unneccessarily by checking the chassis serial number first. The first step to check rack servers is to ensure it is one of these affected product IDs: N1K-C1010, R200-1120402W, R200-BUN-1, R200-BUN-2, R200-BUN-3, R200-BUN-4, R210-2121605W, R210-BUN-2, R210-STAND-CNFGW, UCS-SP2-C210V, or UCS-SP-C210E.
  3. For affected rack server(s), note the chassis serial number(s) for validation. Use one of these methods to retrieve the serial number and product ID:
    • Method 1: Physically inspect the rack server. Product information is displayed on a sticker on the bottom of the unit. The serial number is also on a sticker placed on the left front mounting ear.
    • Method 2: Log in to the Cisco Integrated Management Controller (CIMC), and note the serial number information displayed on the summary page (as shown in this image):
    • Method 3: For systems integrated with UCS Manager (UCSM) environments, log in to UCS Manager, and navigate to Equipment > Rack-Mounts > rack server name. The product ID and serial number are displayed (as shown in this image):
    • Method 4: Server chassis product IDs and serial numbers can be retrieved via the command-line interface with these commands: : scope chassis and show detail. For example:
  4.       [servername]$ ssh server_address
           password: <password>
           servername# scope chassis
           servername /chassis # show detail
           Chassis:
           Power: on
           Serial Number: QCI140205ZZ
           Product Name: UCS C210 M2
           PID : R210-2121605W
           UUID: F2A5E738-D8FE-DE11-76AE-8843E138ADA4
           Locator LED: off
           Description:
           Power Restore Policy: power-off
           Power Delay Type: fixed
           Power Delay Value(sec): 0
  5. Use the Cisco Rack Server Serial Number Validation Tool to check the rack server serial numbers to see if they were shipped with an affected LSI 1064e RAID controller.
  6. Rack servers that are possibly affected will be identified as affected by the validation tool. For rack servers that are possibly affected, the cover must be removed, and the LSI 1064e RAID controller serial number must be retrieved in order to make final validation of whether it is affected. Instructions for access to the mezzanine card can be found in the Installing a Mezzanine Card section of these Cisco documents:

    These images show the location of the LSI 1064e mezzanine card within the rack server chassis:

    The HDD cable must be temporarily removed in order to view the serial number. These images show the serial number location on the LSI 1064e RAID controller mezzanine card:

  7. After collection of potentially affected LSI 1064e RAID controller mezzanine card serial numbers, confirm whether the units are affected by entering the serial number(s) into the LSI 1064e Mezzanine Card Serial Number Validation Tool.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.