Guest

Cisco Carrier Routing System

Field Notice: FN - 63301 - CRS: Certain 8-10GBE boards may have fuse reliability issue which can degrade over time - Fix on Failure

Field Notice: FN - 63301 - CRS: Certain 8-10GBE boards may have fuse reliability issue which can degrade over time - Fix on Failure

Revised October 29, 2012

August 2, 2010


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.1
29-OCT-2012
Removed UMPIRE form.
1.0
02-AUG-2010
Initial Public Release

Products Affected

Product Top Assembly Printed Circuit Assembly
Products Affected
Part #
Rev.
Part #
Rev.
8-10GBE(=)
800-24545-07
Rev A0
73-9231-10
Rev 10

Engineering Change Order (ECO)

Engineering Change Order (ECO) Remarks
E097834
CRS1 8-10GBE Fuse Design Improvements

Problem Description

CRS1's 8-10GBE Line Cards (LC) built between 9th January 2006 and 5th October 2007 may encounter an issue where a fuse may fail and the board will cease to operate. Under certain conditions failure can occur on installed boards that are in steady state operation. Failures may also occur while powering up or during OIR.

Note: There are no safety concerns regarding the failure mode of this fuse.

Background

In 2006, a new fuse was introduced on the 8-10GBE Line Card to meet industry regulations. Used in a specific placement on the 8-10GBE, this fuse has encountered long term reliability issues and failures can occur after OIR operations.

The rate of degradation can change due to variability of the fuse's metallic layers as well as the ambient temperature of the CRS system. Ambient temperatures of 35 deg C and above will increase the likelihood of the fuse failing.

Problem Symptoms

The CRS1 8-10GBE board will fail to power up when the fuse has failed.

Sample error log for when LC fuse fails during operation:

SP/0/0/SP:Jan 28 02:49:25.369 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL :
LC power-up
failed because - 5V_A or 5V_B or 5V_C is bad - as indicated by power status registers


SP/0/0/SP:Jan 28 02:49:25.380 : i2c_server[60]: %PLATFORM-I2C-6-LC_POWER_FAIL : LC
power-up failed because - 1.5V or 1.8V or 3.3V is bad - as indicated by power status
registers

SP/0/0/SP:Jan 28 02:49:25.381 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL : LC
power-up failed because - egress-pse power is bad - as indicated by power status registers

SP/0/0/SP:Jan 28 02:49:25.382 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL : LC
power-up failed because - CPU power is bad - as indicated by power status registers

SP/0/0/SP:Jan 28 02:49:25.383 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL : LC
power-up failed because - PLIM power is bad - as indicated by power status registers

SP/0/0/SP:Jan 28 02:49:25.384 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 1.6V1 on CPU is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.384 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 1.8V on CPU is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.385 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 2.5V on CPU is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.386 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 3.3V on CPU is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.388 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 1.2V on METRO1 is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.388 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 2.5V on METRO1 is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.389 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 5V_C on LC is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.390 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 3.3V on LC is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.391 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 1.8V on LC is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.391 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - 1.5V on LC is bad - as indicated by power good registers

SP/0/0/SP:Jan 28 02:49:25.392 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC
power-up failed because - power-supply on PLIM is bad - as indicated by power good
registers


Sample error log for the LC fuse fail during a subsequent boot attempt:

LC/0/0/CPU0:Apr 12 07:23:21.449 : cpuctrl[220]: %PLATFORM-CPUCTRL-3-
HW_DETECTED_ERROR_LINK : HW error interrupt link, port = 9 interrupt_id = 0x0,
port_link_error = 0x00000001, port_link_crc_count = 0x00000003
LC/0/0/CPU0:Apr 12 07:23:21.458 : pse_driver[173]: %L2-PSE-7-ERR_EXIT : Exit on error: M0:
Head FIFO overflow. Threshold value=0xffffffff: Caused by Input/output error : pkg/bin/pse_driver :
(PID=36914) : -Traceback= 482251f0 4820f7e4 48213980 48213cb0 48214204 fc5d3dd4
fc5cd020 fc1b7f88
LC/0/0/CPU0:Apr 12 07:23:21.454 : egressq[125]: %L2-EGRESSQ-3-HW_ERROR : Sharq ENQ
packet length error occurred.
RP/0/RP0/CPU0:Apr 12 07:23:35.068 : shelfmgr[333]: %PLATFORM-SHELFMGR-3-
NODE_RESET_BRINGDOWN : Reset node 0/0/CPU0 due to heartbeat loss
LC/0/5/CPU0:Apr 12 07:23:38.427 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-
LNS_LOP_DROP : low availability of planes, aggr cell drop count: 110
LC/0/4/CPU0:Apr 12 07:23:38.430 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-
LNS_LOP_DROP : low availability of planes, aggr cell drop count: 130
LC/0/1/CPU0:Apr 12 07:23:38.460 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-
LNS_LOP_DROP : low availability of planes, aggr cell drop count: 356
RP/0/RP0/CPU0:Apr 12 07:23:40.219 : invmgr[205]: %PLATFORM-INV-6-
NODE_STATE_CHANGE : Node: 0/0/SP, state: BRINGDOWN
RP/0/RP0/CPU0:Apr 12 07:23:40.609 : invmgr[205]: %PLATFORM-INV-6-
NODE_STATE_CHANGE : Node: 0/0/CPU0, state: BRINGDOWN
RP/0/RP0/CPU0:Apr 12 07:23:40.920 : invmgr[205]: %PLATFORM-INV-6-
NODE_STATE_CHANGE : Node: 0/0/CPU0, state: PRESENT
RP/0/RP0/CPU0:Apr 12 07:23:51.232 : shelfmgr[333]: %PLATFORM-MBIMGR-7-
IMAGE_VALIDATED : 0/0/SP: MBI bootflash:mbis/hfr-os-mbi-3.3.1.CSCek61756-
1.0.0/cfc2413f7ad0e7e65a1c7f12c0​f7aec4/mbihfr-sp.vm​ validated
RP/0/RP0/CPU0:Apr 12 07:23:52.085 : invmgr[205]: %PLATFORM-INV-6-
NODE_STATE_CHANGE : Node: 0/0/SP, state: MBI-BOOTING
RP/0/RP0/CPU0:Apr 12 07:24:08.049 : invmgr[205]: %PLATFORM-INV-6-
NODE_STATE_CHANGE : Node: 0/0/SP, state: MBI-RUNNING
RP/0/RP0/CPU0:Apr 12 07:24:25.426 : invmgr[205]: %PLATFORM-INV-6-
NODE_STATE_CHANGE : Node: 0/0/SP, state: IOS XR RUN
SP/0/0/SP:Apr 12 07:24:06.014 : init[65541]: %OS-INIT-7-MBI_STARTED : total time 8.478 seconds
SP/0/0/SP:Apr 12 07:24:16.489 : sysmgr[73]: %OS-SYSMGR-5-NOTICE : Card is COLD started
SP/0/0/SP:Apr 12 07:24:18.712 : init[65541]: %OS-INIT-7-INSTALL_READY : total time 21.192 seconds
SP/0/0/SP:Apr 12 07:24:34.784 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon
process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the
'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:36.588 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:39.405 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count
2) will be respawned in 5 seconds
SP/0/0/SP:Apr 12 07:24:39.399 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:45.363 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon
process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the
'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:45.406 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count
3) will be respawned in 5 seconds
SP/0/0/SP:Apr 12 07:24:45.402 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:51.778 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon
process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the
'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:51.853 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 4) will be respawned in 5 seconds
SP/0/0/SP:Apr 12 07:24:51.849 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:57.662 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon
process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the
'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:57.714 : sysmgr[73]: %OS-SYSMGR-2-REBOOT : reboot required, process (envmon) reason (maximum restart attempts exceeded)
SP/0/0/SP:Apr 12 07:24:58.111 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(1) (jid 104) can not be restarted, entering slow-restart mode
SP/0/0/SP:Apr 12 07:24:58.118 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count
5) will be respawned in 30 seconds
SP/0/0/SP:Apr 12 07:24:57.709 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:58.129 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon[104] (pid 69678)
has not sent proc-ready within 45 seconds
SP/0/0/SP:Apr 12 07:24:58.614 : /pkg/bin/sysmgr_log[65585]: %OS-SYSMGR-4-
CHECK_LOG : /pkg/bin/shutdown_debug_script invoked by sysmgr. Reason: (envmon) maximum
restart attempts exceeded, Compressed output will be saved.

Workaround/Solution

Cisco recommends replacing the suspect LC hardware (8-10GBE). The upgrade program is now closed and replacement is supported via Cisco RMA process.

As of approximately 1st November 2007 new products that were manufactured under Engineering Change Order (ECO) E097834 should be free of this problem. Refer to "How to Identify Hardware Levels" below for instructions on how to view the version and serial number.

Note: Products with ECO E097834 applied are not affected even when the LC falls within the serial number listed in tool as affected. Boards with TAN 800-24545-08 have ECO E097834 applied.

LC TAN or Part number Steps Action
800-24545-07 and lower
If affected, request replacement by following Cisco RMA process.
800-24545-08 and higher
No check required
LC is good; no replacement required.

How To Identify Hardware Levels

The hardware level and serial number of the 8-10GBE Line Card can be verified by running CLI command or inspecting the LC physically. Both steps are listed below.

A) Using CLI Command:

1) Check the CRS1 8-10GBE TAN by using the show diag command below. If the TAN is 800-24545-08 or higher, the 8-10GBE is already upgraded and does NOT need replacing, and no further checks are necessary.

2) If the TAN is 800-24545-07 and lower, check suspect Serial Numbers by clicking on the SN Validation tool. If this tool returns results as 'Affected', the 8-10GBE is suspect. Please request board(s) replacement by filling the Upgrade form below.

Sample 'show diag' output (in admin mode) for identifying a 8-10GBE that needs to be replaced:

RP/0/RP0/CPU0:ios#sh diag
... Output truncated.....
ECI: 173644
PLIM 0/PL0/* : Cisco CRS-1 Series 8x10GbE Interface Module
MAIN: board type 600095
800-24545-05 rev A0 <--- TAN
dev N/A
S/N SAD1nnnnnn <---- Serial Number
PCA: 73-9231-09 rev A0
PID: 8-10GBE
VID: V05
CLEI: IPUIA1CRAA
ECI: 147655
Interface port config: 8 Ports
Optical reach type: Unknown
Connector type: SC
NODE 0/0/CPU0
Node State : IOS XR RUN
PLD: Motherboard: 0x0015, Processor: 0x0015, Power: N/A
MONLIB: QNXFFS Monlib Version 3.1
ROMMON: Version 1.54(20091016:214209) [CRS-1 ROMMON]
CARD 0/1/* : Cisco CRS-1 Series Modular Services Card revision B
MAIN: board type 500063
800-27067-08 rev A0
dev N/A
S/N SAD1403008B
PCA: 73-10334-08 rev A0

Sample show diag output (in admin mode) for identifying a good 8-10GBE LC that does not need to be replaced:

RP/0/RP0/CPU0:ios#sh diag
... Output truncated.....
ECI: 173644
PLIM 0/PL0/* : Cisco CRS-1 Series 8x10GbE Interface Module
MAIN: board type 600095
800-24545-08 rev A1 <--- TAN
dev N/A
S/N SAD1nnnnnn <---- Serial Number
PCA: 73-9231-09 rev A0
PID: 8-10GBE
VID: V05
CLEI: IPUIA1CRAA
ECI: 147655
Interface port config: 8 Ports
Optical reach type: Unknown
Connector type: SC
NODE 0/0/CPU0
Node State : IOS XR RUN
PLD: Motherboard: 0x0015, Processor: 0x0015, Power: N/A
MONLIB: QNXFFS Monlib Version 3.1
ROMMON: Version 1.54(20091016:214209) [CRS-1 ROMMON]
CARD 0/1/* : Cisco CRS-1 Series Modular Services Card revision B
MAIN: board type 500063
800-27067-08 rev A0
dev N/A
S/N SAD1403008B
PCA: 73-10334-08 rev A0

B) Physically Checking the Line Card

Refer to the picture below for location of TAN and Serial number. The picture is for suspect 8-10GBE LC.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.