Guest

Cisco MGX 8800 Series Switches

Field Notice: FN - 63048 - MGX-RPM-XF-512 SAR Memory Corruption - Fix on Failure

Field Notice:

September 23, 2010


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.0
September 23, 2010
Initial Public Release

Products Affected

Products Affected Comments
MGX-RPM-XF-512
replaced by 800-09307-10 
MGX-RPM-XF-512=
replaced by 800-09307-10

Problem Description

MGX-RPM-XF-512 boards shipped after Dec 1, 2006 use a specific version of SRAM which may be affected by power supply noise. This can cause the SAR Controller internal memory register to corrupt and cause a board crash.

Background

All version (TAN) 800-09307-08 boards shipped after Dec 1, 2006 are subject to this condition.

The version number (TAN) 800-09307-09 was released on Oct 23, 2007, but boards were not built or shipped to customers.

After February 19, 2008, boards built under TAN 800-09307-10 are not subject to this condition.

Problem Symptoms

Between mid 2007 and early 2008, Cisco MGX customers experienced intermittent failures on MGX-RPM-XF-512 boards.

This resulted in both SAR reset and occasional board reset issues during normal operation. These failures are soft in nature and the board will function normally after reset.

Either the SAR reset or board reset will cause a traffic impacting event. Each time a board resets, the system log will show an interrupt received from the segmentation SAR.

The error will also be present in a crash file named sar_mxt4600_tx_info, as shown below. The crash file is stored in bootflash memory.

An example of a SAR crash message from the crash file:

SWITCH_IF-3-INTERRUPT: Received interrupt from SEGMENTATION SAR - OCTRAP register = 0x000004BD

In some events, the SAR crashinfo file written to the bootflash could be incomplete. It may be viewed using the show bootflash: command. The cyclic redundancy check (crc) of the incomplete file will be shown as 0xFFFFFFFF in the example below:

2 E. unknown FFFFFFFF 4ABB9C 35 108520 Jun 25 2007 16:58:33 +00:00 sar_mxt4600_tx_info_20070625-165833

Another symptom that may be observed is SAR buffer exhaustion. The show controller Switch1 command shows the SAR buffer usage in class 2 buffers for Segmentation SAR greater than 100 percent.

Workaround/Solution

Cisco has pro-actively replaced all suspect boards in customer networks. The proactive replacement program for this issue is closed. Cisco recommends replacement of suspect board with the help of the table below. Follow the Cisco RMA Procedure to request a replacement, enter the failure code as Field Notice Alert, and refer to this FN number.

As of approximately February 19th, 2008 new products that were manufactured under Engineering Change Order (ECO) E092444 are guaranteed to be free of this problem. Parts remanufactured under this ECO will be upgraded to Ver -10.

Refer to How to Identify Hardware Levels for instructions on how to view the version of an in-service product. Matching serial numbers that are remanufactured to Version -10 are free of this defect and do not need to be replaced.

Part Number or TAN Steps Action
800-09307-08 or lower Check Serial Number Tool If affected, request replacement.
800-09307-10 or higher No check required. Not affected. No action required.

Note: 800-09307-09 was not shipped to customers.

For failures observered related to this Field Notice, follow the Cisco RMA Procedure.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered customer and you must be logged in.

DDTS Description
CSCsk04437 (registered customers only)  

How To Identify Hardware Levels

  1. Command and output to identify serial number and TAN:



    The above example shows a suspected board.

    Note: The boards with matching serial numbers and part number 800-09307-10 are already upgraded and do not need to be returned.

  2. Identifying the serial number on the physical board.

    See the location circled in the picture below.



    Note: The boards with matching serial numbers and part number 800-09307-10 are already upgraded and do not need to be returned.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.