Guest

Cisco UCS B-Series Blade Servers

Field Notice: FN - 63430 - UCS B440 MOSFET Failure Can Cause Overheated Components Leading to Blade Shutdown - Hardware Replacement Ordering Procedure Revised

Revised August 13, 2012

July 12, 2011


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.4
13-AUG-2012
Updates to title, Problem Description, Workaround/Solution (and order form), How to Identify Hardware Levels, and For More Information sections.
1.3
05-JUL-2012
Updates to title, Problem Description, Background, Workaround/Solution (and order form process), How to Identify Hardware Levels, and For More Information sections.
1.2
07-FEB-2012
Updates to Products Affected section
1.1
26-JAN-2012
Updates made to Products Affected, Problem Description, Background, Workaround/Solution, and For More Information sections.
1.0
12-JUL-2011
Initial Public Release

Products Affected

Products Affected Comments
B440-BASE-M2  UCS B440 M2 Blade Server
B440-BASE-M2=  UCS B440 M2 Blade Server (spare) 
B440-BASE-M2D  Alternate identifier for UCS B440 M2 Blade Server 
B440-BASE-M2UPG  Alternate identifier for UCS B440 M2 Blade Server 
B440M1-BUN1  Product ID is bundle containing affected UCS B440 blade 
B440M1-BUN2  Product ID is bundle containing affected UCS B440 blade 
N20-B6740-2  UCS B440 M1 Blade Server
N20-B6740-2-UPG  Alternate identifier for UCS B440 M1 Blade Server
N20-B6740-2=  UCS B440 M1 Blade Server
N20-B6740-2D  Alternate identifier for UCS B440 M1 Blade Server
UCS-B440M2-VCDL1 Alternate identifier for UCS B440 M2 Blade Server 

Cisco confirms that this Field Notice relates to one UCS product - the UCS B440 Blade Server. The Product IDs listed above are unique identifiers used by Cisco customers to order different packages that include a UCS B440 Blade Server.

Problem Description

Failure of a MOSFET power transistor on the blade server may cause the component to overheat, and emit a short flash and may lead to board failure. In some circumstances this symptom may affect the other blades in the chassis by disrupting power flow.

Update 1:
The firmware upgrade initially prescribed in this Field Notice has successfully detected component failures and shut down servers as expected. Since this upgrade was released, however, a MOSFET failure on a UCS B440 Blade Server has resulted in a second thermal event. Cisco has determined that a hardware modification to the UCS B440 Blade Server is appropriate. No other UCS hardware is affected. A UCS B440 Blade Server hardware replacement program has been launched.

Update 2:
The ordering process has been updated to the order form process noted in the Workaround/Solution section of this Field Notice.

Update 3:
The hardware version identification process has been updated to note a software defect preventing proper version display in UCSM.

Background

A failure has been observed where a MOSFET power transistor failed in a manner that caused the MOSFET to overheat and emit a flash before failing. A firmware fix has been developed as a preventative measure for avoiding the overheating and flash event in case of failure.

There is no indication of a systemic issue with the MOSFET components, and the observed failure in the field is considered to be a random component failure. The firmware upgrade is intended to be a preventative measure to avoid shorted out components and any effects on other installed elements within the UCS chassis.

Update 1:
The firmware upgrade initially prescribed in this Field Notice has successfully detected FET failures and shut down servers, preventing a potential thermal event. Since this upgrade was released, however, a FET failure on a UCS B440 Blade Server has resulted in a second incident as described above. Cisco is directly contacting UCS B440 Blade Server customers and will replace UCS B440 Blade Servers currently deployed at customer sites. Cisco is making UCS B440 Blade Server hardware modifications, and a hardware replacement program has been launched. No other UCS hardware is affected.

Update 2:
The majority of UCSB440 Blade Servers affected by this issue have been addressed through direct contact by Cisco or authorized partner. Cisco customers with affected Blade Servers who have not yet taken advantage of the proactive replacement program are eligible for replacement through the ordering process in the Workaround/Solution section of this Field Notice.

Problem Symptoms

There is no symptom during normal operation. If the MOSFET fails in a shorted mode, a flash may be emitted and it will lead to a system board failure.

Workaround/Solution

Cisco recommends replacing UCS B440 M1 and UCS B440 M2 Blade Servers at hardware version level 01 with blades at version level 02. Cisco will provide replacement Blade Servers free of charge. Please see the section How to Identify Hardware Levels for instructions for determining whether a version 01 UCS B440 M1 or M2 Blade Server is installed. If running hardware version 01, you are eligible for replacement. To order replacement Blade Server(s), download and save the order form and follow instructions contained within. The replacement UCS B440 order form is located here:

http://www.cisco.com/en/US/ts/fn/634/order_63430.xls

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered customer and you must be logged in.

DDTS Description
CSCtz65329 (registered customers only) Part number in UCSM doesn't match mctools

How To Identify Hardware Levels

UCS B440 M1 version 01 Blade Servers should be replaced with UCS B440 M1 version 02.
UCS B440 M2 version 01 Blade Servers should be replaced with UCS B440 M2 version 02. Use any of the following methods to determine whether you have a version 01 Blade Server requiring replacement:

Method 1: External Visual Inspection
The bezel of a UCS B440 M1 or B440 M2 version 01 Blade Server is printed with black characters on a light background. Version 01 Blade Servers should be replaced.
The bezel of a UCS B440 M1 or B440 M2 version 02 Blade Server is printed with light characters on a black background.
Example:

Method 2: Part Number Label
The Cisco Product ID/Version ID (PID/VID) label displays hardware Version ID (VID) 02 for units containing the hardware update for this issue.

Method 3: UCS Manager Display
There is a software defect in UCS Manager (UCSM) versions prior to release 2.0(4) which causes the version information to be incorrectly displayed. UCSM display should not be used for determining hardware version unless running version 2.0(4) or higher. When using an appropriate version of UCSM, a new UCS B440 blade server can be identified by the Version ID (VID) level of 02 or higher. UCSM version 2.0(4) is scheduled for release in calendar Q3 of 2012, and will address the related defect noted in the DDTS table of this document.

Method 4: Retrieving Blade Type via Show Tech Support
This method confirms a version 02 Blade Server by retrieving board part number information through a Show Tech support command. 
Example:

server-A# connect local-mgmt
server-A(local-mgmt)# show tech-support chassis 1 cimc 1 detail
The detailed tech-support information is located at workspace:///techsupport/20120524222323_server_BC1_CIMC01.tar
server-A(local-mgmt)#
server-A(local-mgmt)# cd techsupport/
server-A(local-mgmt)# copy ./20120524222323_server_BC1_CIMC01.tar scp://root@192.168.1.2/tftpboot/  (note: use your own destination address)
 

After copying off and untarring the tech support file, open and search the Blade Details txt file as shown below:
 
            20120524222323_john_BC1_CIMC01/tmp/CIMC1_TechSupport.txt
 
Search for ShowFru and find Board Part Number. Example:
 
======================[  Board Area ]=========================
Language Code                    : English
MFG Date / Time                  : 02  0A  0C
Board MFG Type / Len             : [Cisco Systems Inc]
Board Product Name               : [B440-BASE-M2]         <<< Blade type
Board Serial Number              : [FCH160671UP]
Board Part Number                : [73-13497-03]         <<< Part Number
Board FRU File ID                : [EM-2]
BOM/Hw/PID                       : [30 34 01 56 30 30 ]
CLEI Code                        : [0000000000]

Key:
            Original M1B440 Part Numbers: 73-12462-01  to 73-12462-08
            Updated M1B440 Part Numbers: 73-14927-01 and later
            Original M2B440 Part Numbers:  73-13497-01 to 73-13497-02
            Updated M2B440 Part Numbers: 73-13497-03 and later

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.