Guest

Cisco UCS B-Series Blade Servers

Field Notice: FN - 63430 - UCS B440 MOSFET Failure Can Cause Overheated Components Leading to Blade Shutdown - Hardware Replacement Required

Field Notice: FN - 63430 - UCS B440 MOSFET Failure Can Cause Overheated Components Leading to Blade Shutdown - Hardware Replacement Required

Revised February 7, 2014

July 12, 2011


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.5
07-FEB-2014
Updates to title, Problem Description, Background, Workaround/Solution, and How to Identify Hardware Levels sections.
1.4
13-AUG-2012
Updates to title, Problem Description, Workaround/Solution (and order form), How to Identify Hardware Levels, and For More Information sections.
1.3
05-JUL-2012
Updates to title, Problem Description, Background, Workaround/Solution (and order form process), How to Identify Hardware Levels, and For More Information sections.
1.2
07-FEB-2012
Updates to Products Affected section
1.1
26-JAN-2012
Updates made to Products Affected, Problem Description, Background, Workaround/Solution, and For More Information sections.
1.0
12-JUL-2011
Initial Public Release

Products Affected

Products Affected Comments
B440-BASE-M2  UCS B440 M2 Blade Server
B440-BASE-M2=  UCS B440 M2 Blade Server (spare) 
B440-BASE-M2D  Alternate identifier for UCS B440 M2 Blade Server 
B440-BASE-M2UPG  Alternate identifier for UCS B440 M2 Blade Server 
B440M1-BUN1  Product ID is bundle containing affected UCS B440 blade 
B440M1-BUN2  Product ID is bundle containing affected UCS B440 blade 
N20-B6740-2  UCS B440 M1 Blade Server
N20-B6740-2-UPG  Alternate identifier for UCS B440 M1 Blade Server
N20-B6740-2=  UCS B440 M1 Blade Server
N20-B6740-2D  Alternate identifier for UCS B440 M1 Blade Server
UCS-B440M2-VCDL1 Alternate identifier for UCS B440 M2 Blade Server 

Problem Description

Failure of a MOSFET power transistor on the blade server may cause the component to overheat and emit a short flash and may lead to board failure. In some circumstances this symptom may affect the other blades in the chassis by disrupting power flow.

Update 1:
The firmware upgrade initially prescribed in this Field Notice has successfully detected component failures and shut down servers as expected. Since this upgrade was released, however, a MOSFET failure on a UCS B440 Blade Server has resulted in a second thermal event. Cisco has determined that a hardware modification to the UCS B440 Blade Server is appropriate. No other UCS hardware is affected. Although a UCS B440 Blade Server hardware replacement program was launched, this program is now over, and a standard RMA will address the issue.

Update 2:
The hardware version identification process has been updated to note a software defect preventing proper version display in UCSM.

Background

A failure has been observed where a MOSFET power transistor failed in a manner that caused the MOSFET to overheat and emit a flash before failing. A firmware fix has been developed as a preventative measure for avoiding the overheating and flash event in case of failure.

There is no indication of a systemic issue with the MOSFET components, and the observed failure in the field is considered to be a random component failure. The firmware upgrade is intended to be a preventative measure to avoid shorted out components and any effects on other installed elements within the UCS chassis.

Update 1:
The firmware upgrade initially prescribed in this Field Notice has successfully detected FET failures and shut down servers, preventing a potential thermal event. Since this upgrade was released, however, a FET failure on a UCS B440 Blade Server has resulted in a second incident as described above. No other UCS hardware was affected by this issue.

Update 2:
The majority of UCS B440 Blade Servers affected by this issue have been addressed through direct contact by Cisco or an authorized Cisco partner and/or by the now closed upgrade program. Although Cisco has made every effort to proactively track down and replace affected hardware, defective units may still be deployed in the field.

Cisco customers with affected Blade Servers who have not yet taken advantage of the proactive replacement program can still obtain a replacement through the standard RMA ordering process.

Problem Symptoms

There is no symptom during normal operation. If the MOSFET fails in a shorted mode, a flash may be emitted and it will lead to a system board failure.

Workaround/Solution

Cisco recommends replacing UCS B440 M1 and UCS B440 M2 Blade Servers at hardware version level 01 with blades at version level 02 or later. Please see the section How to Identify Hardware Levels for instructions for determining whether a version 01 UCS B440 M1 or M2 Blade Server is installed. If running hardware version 01, you are eligible for replacement using the standard RMA process.

1. Open a Cisco TAC case.
2. Provide the version level information needed by TAC. See "How to Identify Hardware levels" section below.
3. Provide "FN-63430" as a reference number.
4. An RMA will be created and a replacement part delivered.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered customer and you must be logged in.

DDTS Description
CSCtz65329 (registered customers only) Part number in UCSM doesn't match mctools

How To Identify Hardware Levels

UCS B440 M1 version 01 Blade Servers should be replaced with UCS B440 M1 version 02 or later. UCS B440 M2 version 01 Blade Servers should be replaced with UCS B440 M2 version 02 or later. Use any of the following methods to determine whether you have a version 01 Blade Server requiring replacement:

Method 1: External Visual Inspection
The bezel of a UCS B440 M1 or B440 M2 version 01 Blade Server is printed with black characters on a light background. Version 01 Blade Servers should be replaced.

The bezel of a UCS B440 M1 or B440 M2 version 02 or later Blade Server is printed with light characters on a black background.

Example:

Method 2: Part Number Label
The Cisco Product ID/Version ID (PID/VID) label displays hardware Version ID (VID) 02 or later for units containing the hardware update for this issue.

Method 3: UCS Manager Display
There is a software defect in early UCS Manager (UCSM) versions at or prior to the 2.0(5b) or 2.1(1f2.0(4) releases, which causes the version information to be incorrectly displayed. UCSM display should not be used for determining hardware version unless running versions 2.0(5b), 2.1(1f), 2.2(1b) or higher.

When using an appropriate version of UCSM, UCS B440 blade server at Version ID (VID) level 02 or later will be identified. If you have an older UCS B440 Version ID (VID) level 01, it will not be displayed and would require an RMA replacement.

The following defect is related to the noted UCSM display issue:

CSCue46600 B440 VID is not displayed properly in UCSM

Method 4: Retrieving Blade Type via Show Tech Support
This method confirms a version 02 Blade Server by retrieving board part number information through a Show Tech support command.

Example:

server-A# connect local-mgmt
server-A(local-mgmt)# show tech-support chassis 1 cimc 1 detail
The detailed tech-support information is located at
workspace:///techsupport/20120524222323_server_BC1_CIMC01.tar
server-A(local-mgmt)#
server-A(local-mgmt)# cd techsupport/
server-A(local-mgmt)# copy ./20120524222323_server_BC1_CIMC01.tar
scp://root@192.168.1.2/tftpboot/ (note: use your own destination address)

After copying off and untarring the tech support file, open and search the Blade Details txt file as shown below:

20120524222323_john_BC1_CIMC01/tmp/CIMC1_TechSupport.txt

Search for ShowFru and find Board Part Number. Example:

======================[ Board Area ]=========================
Language Code : English
MFG Date / Time : 02 0A 0C
Board MFG Type / Len : [Cisco Systems Inc]
Board Product Name : [B440-BASE-M2] <<< Blade type
Board Serial Number : [FCH160671UP]
Board Part Number : [73-13497-03] <<< Part Number
Board FRU File ID : [EM-2]
BOM/Hw/PID : [30 34 01 56 30 30 ]
CLEI Code : [0000000000]

Key:
Original M1 B440 Part Numbers: 73-12462-01 to 73-12462-08
Updated M1 B440 Part Numbers: 73-14927-01 and later
Original M2 B440 Part Numbers: 73-13497-01 to 73-13497-02
Updated M2 B440 Part Numbers: 73-13497-03 and later

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.