Guest

Cisco UCS B-Series Blade Servers

Field Notice: FN - 63387 - UCS B-Series and C-Series Servers Log Memory Errors due to Intel 5600 Erratum Issue

Revised November 8, 2013
January 11, 2011


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.1
08-NOV-2013

Add clarity with regards to the need to disable the C6 State

1.0
11-JAN-2011
Initial Public Release

Products Affected

Products Affected Comments
UCSB - N20-B6625-1
UCS B200 M2 Blade Server
UCSB - N20-B6625-1-UPG
UCS B200 M2 Blade Server
UCSB - N20-B6625-1=
UCS B200 M2 Blade Server
UCSB - N20-B6625-1D
UCS B200 M2 Blade Server
UCSC - R200-1120402W 
UCS C200 M2 Rack Server
UCSC - R200-1120402W-CH
UCS C200 M2 Rack Server
UCSC - R200-1120402W=
UCS C200 M2 Rack Server
UCSC - R210-2121605W
UCS C210 M2 Rack Server
UCSC - R210-2121605W=
UCS C210 M2 Rack Server
UCS - UC-N20-B6625-1
UCS B200 M2 Blade Server
UCS - UC-R200-1120402W
UCS C200 M2 Rack Server
UCS - UC-R210-2121605W
UCS C210 M2 Rack Server
UCS - UCS-B200M2-VCS1
UCS B200 M2 Blade Server
UCS - UCS-C200M2-VCD2
UCS C200 M2 Rack Server
UCS - UCS-C210M2-VCD2
UCS C210 M2 Rack Server

Problem Description

Cisco UCS servers running Intel Xeon 5600 CPUs may experience correctable/uncorrectable memory errors logged in runtime due to an interaction with the Advanced Configuration and Power Interface (ACPI) C3. With ACPI C3 (Intel C6 power state) enabled, a user may be exposed to an Intel 5600 series processor circuit marginality erratum, which could cause the processor to corrupt memory on a write cycle and subsequently report an uncorrectable error when this memory is re-read.

Notes:
1. Errors prior to runtime (i.e. POST of server) are not related to this issue.
2. There may be other reported memory errors that are not necessarily related to this issue.
3. This issue affects UCS B200 M2, C200 M2, and C210 M2 models supporting Intel Xeon 5600 CPUs.
4. UCS B250 M2 and C250 M2 units support Intel 5600, but are not affected because C6 is not implemented at the same level.
5. If a UCS B200 M1, C200 M1, or C210 M1 has been modified to run with Intel Xeon 5600, the system may be affected by this issue.

Background

On a subset of Intel Xeon 5600 series processors, during packet C6 transitions, CPU circuit marginality or internal signaling noise in the may lead to an invalid memory DRAM state, system hang, reboot, memory ECC errors or unpredictable system behavior. This could create an invalid memory state, system hang, reboot, and memory ECC errors.

This issue affects only the Intel Xeon 5600 series processors in systems that are running with C6 enabled, and the memory errors are reported in run- time. Memory failures reported in POST are not affected.

This issue is not unique to Cisco UCS servers. Other platforms using Intel Xeon 5600 series processors may potentially experience the same issue.

Additional Reference:
Intel's document 'Intel Xeon Processor 5600 Series Specification Update', at:
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5600-specification-update.pdf
See errata number BD103/104

Problem Symptoms

1. Uncorrectable (and possibly correctable) errors are generated by Xeon 5600 processor errata rather than actual DRAM memory errors. This is displayed in the System Event Log (SEL) in UCS Manager as 'Uncorrectable ECC/other uncorrectable memory error'.

And/Or

2. Server operating system halts and server hangs, or reboots.

The following errors do not apply:
- Errors at POST
- Errors that occur when an OS is not running
- Errors that occur when the Intel C6 feature is disabled

Only errors that occur during runtime apply.

Workaround/Solution

First, verify if the server is affected by this issue by using UCS Manager, CIMC or the CLI to identify the processor and BIOS configuration of the server to determine if the server could be affected. If the server has C6 enabled and is running a version prior to those noted below, you should proceed to the stated solution or workaround procedure. Errors observed as a result of this issue are not due to defective memory hardware, so DIMMs should not be replaced unless this issue is ruled out by implementation of the solution or workaround procedures below.

Solution:
This behavior is corrected in the following UCS software versions:

Unified Computing System Blade BIOS versions bundled with release 1.3(1p) and later
Unified Computing System Blade BIOS versions bundled with release 1.4(1i) and later
Unified Computing System C-Series Server BIOS bundled with release 1.2(2d) and later
Note: Use UCS B-Series release 1.4(1i) or later for C-Series servers running in UCSM Managed mode.

BIOS is bundled in larger UCS software packages. For clarity, specific minimum BIOS versions per platform are listed here:

For UCS B-Series Release 1.3(1p):
B200-M2 - S5500.1.3.1g.0.120720100638 and later

For UCS B-Series release 1.4(1i):
B200-M2 - S5500.1.4.1f.0.120820101100 and later

For UCS C-Series release 1.2(2d)
C200 - C200.1.2.2f.0.112720102041 and later
C210 - C200.1.2.2f.0.112720102041 and later

Upgrade procedures for Cisco UCS B-Series Blade Servers can be found here:
http://www.cisco.com/en/US/products/ps10280/tsd_products_support_install_and_upgrade.html

Upgrade procedures for Cisco UCS C-Series Rack Servers can be found here
http://www.cisco.com/en/US/products/ps10493/tsd_products_support_install_and_upgrade.html

Note: The solution is included in the CPU microcode which is bundled in the BIOS. The BIOS must be upgraded for the solution to be applied. If the server runs a BIOS version earlier than those listed above, you should first upgrade the BIOS to the version that supports Intel Errata BD103/104 (listed versions or later). If the server has the Processor C6 state set to "enabled" you should proceed to the workaround procedure described here after the BIOS update.


Workaround:
This issue can be avoided by disabling the Intel C6 feature. This may be done for each server directly, or may be accomplished via a BIOS policy attached to a service policy on UCS Blade Servers. Direct BIOS setup change is the only option for UCS C-Series rack servers which are not running in UCSM managed mode. Both procedures are detailed below. Disabling C6 via BIOS policy is the preferred method if managing through the UCSM management application.


Steps for Disabling Intel C6 Feature via BIOS Policy through CLI:

1. Create a BIOS policy:
# scope org /
# create bios-policy <bios policy name>
* # set processor-c6-report-config processor-c6-report disabled
* # commit-buffer

2. Apply this policy to a Service Profile:
# scope org <org>
# scope service-profile <name>
# set bios-policy <bios policy name>
* # commit-buffer

3. Reboot this server from OS or UCSM.


Steps for Disabling the Intel C6 feature via BIOS policy through the UCSM Graphic User Interface can be found at:
http://www.cisco.com/en/US/products/ps10281/products_configuration_example09186a0080b36538.shtml



Steps for Disabling Intel C6 Feature Directly:

1. Reboot the system and press F2 to enter setup.

2. Go to Advanced and select Processor Configuration.


3. Go to processor C6 and disable it.


4. Press F10 to save and exit.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered customer and you must be logged in.

DDTS Description
{CSCtj38908} (registered customers only) Processor Microcode Update 0000206C2_000000013

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.