Guest

Cisco Carrier Routing System

Field Notice: FN - 62964 - CRS: Some CRS1-SIP-800 Boards Have a Fuse Reliability Issue Which Can Degrade Over Time - Workaround Fix on Failure


Revised July 20, 2009
November 15, 2007


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.2 20-JUL-2009 Since the replacement program is completed, the replacement form has been removed from this field notice as part of this update. Failure matching condition in this field notice will be serviced with RMA process.
1.1 27-NOV-2007 The Workaround/Solution section was updated to include an additional note about using the "ARFN" RMA code when a SIP failure is related to this Field Notice.
1.0 15-NOV-2007 Initial Public Release

Products Affected

Products Affected Top Assembly Printed Circuit Assembly Comments
Part Number Revision Part Number Revision
CRS1-SIP-800-SK 800-23819-04 B0 73-8982-06 H0 Failing boards shipped between 06/20/2006 and 06/22/2007. An exact SN list is provided below.
CRS1-SIP-800 800-23819-04 B0 73-8982-06 H0  

Problem Description

CRS1-SIP-800 cards built between June 20, 2006 and June 22, 2007 that are operating within a CRS chassis may encounter an issue where by a fuse will fail and the board will cease to operate. The failure can occur after the board has been installed and in steady state operation for about two months. Failures may also occur during initial power-up.

Note: There are no safety concerns regarding the failure mode of this fuse.

Background

In June 2006, a new fuse was introduced on the CRS1-SIP-800 to meet industry regulations. Used in a specific placement on the CRS1-SIP-800, this fuse has encountered long term reliability issues- failures have occurred after two months of operation in some networks.
The rate of degradation will vary due to variability of the fuse's metallic layers, and also due to the ambient temperature of the CRS system. Ambient temperatures of 35 degrees C and above will increase the likelihood of the fuse failing.

Click here to see which CRS1-SIP-800 serial numbers are affected.

Problem Symptom

The CRS1-SIP-800 will fail to power up when the fuse has failed.

Sample error log for when the CRS1-SIP-800 fails:
SP/0/0/SP:Apr 17 09:54:10.831 : i2c_server[58]:
%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - 5V_A or 5V_B or 5V_C is bad - as indicated by power status registers SP/0/0/SP:Apr 17 09:54:11.051 : i2c_server[58]:
%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - 1.5V or 1.8V or 3.3V is bad - as indicated by power status registers SP/0/0/SP:Apr 17 09:54:11.059 : i2c_server[58]:
%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - egress-pse power is bad - as indicated by power status registers SP/0/0/SP:Apr 17 09:54:11.061 : i2c_server[58]:
%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - CPU power is bad - as indicated by power status registers SP/0/0/SP:Apr 17 09:54:11.063 : i2c_server[58]:
%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - PLIM power is bad - as indicated by power status registers SP/0/0/SP:Apr 17 09:54:11.069 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.6V1 on CPU is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.071 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.8V on CPU is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.073 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 2.5V on CPU is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.077 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 3.3V on CPU is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.086 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.2V on
METRO1 is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.088 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 2.5V on
METRO1 is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.094 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 5V_C on LC is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.096 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 3.3V on LC is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.098 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.8V on LC is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.100 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.5V on LC is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.103 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - power-supply on PLIM is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.105 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 2.5V on METRO0 is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.111 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 5V_A on LC is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:11.113 : i2c_server[58]:
%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 5V_B on LC is bad - as indicated by power good registers SP/0/0/SP:Apr 17 09:54:34.872 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT :
Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown
Error(511)
SP/0/0/SP:Apr 17 09:54:34.941 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 6) will be respawned in 30 seconds SP/0/0/SP:Apr 17 09:54:34.936 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled SP/0/0/SP:Apr 17 09:54:41.981 : /pkg/bin/sysmgr_log[65582]:
%OS-SYSMGR-4-CHECK_LOG : Node shutdown script completed; log copied to /net/node0_RP0_CPU0/harddisk:/shutdown/node0_0_SP.log.gz

Sample error log for when the CRS1-SIP-800 fails during a subsequent boot attempt:

LC/0/0/CPU0:Apr 12 07:23:21.449 : cpuctrl[220]: %PLATFORM-CPUCTRL-3-HW_DETECTED_ERROR_LINK : HW error interrupt link, port = 9 interrupt_id = 0x0, port_link_error = 0x00000001, port_link_crc_count = 0x00000003
LC/0/0/CPU0:Apr 12 07:23:21.458 : pse_driver[173]: %L2-PSE-7-ERR_EXIT : Exit on error: M0: Head FIFO overflow. Threshold value=0xffffffff: Caused by Input/output error : pkg/bin/pse_driver : (PID=36914) : -Traceback= 482251f0 4820f7e4 48213980 48213cb0 48214204 fc5d3dd4 fc5cd020 fc1b7f88
LC/0/0/CPU0:Apr 12 07:23:21.454 : egressq[125]: %L2-EGRESSQ-3-HW_ERROR : Sharq ENQ packet length error occurred.
RP/0/RP0/CPU0:Apr 12 07:23:35.068 : shelfmgr[333]: %PLATFORM-SHELFMGR-3-NODE_RESET_BRINGDOWN : Reset node 0/0/CPU0 due to heartbeat loss
LC/0/5/CPU0:Apr 12 07:23:38.427 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-LNS_LOP_DROP : low availability of planes, aggr cell drop count: 110
LC/0/4/CPU0:Apr 12 07:23:38.430 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-LNS_LOP_DROP : low availability of planes, aggr cell drop count: 130
LC/0/1/CPU0:Apr 12 07:23:38.460 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-LNS_LOP_DROP : low availability of planes, aggr cell drop count: 356
RP/0/RP0/CPU0:Apr 12 07:23:40.219 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: BRINGDOWN
RP/0/RP0/CPU0:Apr 12 07:23:40.609 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/CPU0, state: BRINGDOWN
RP/0/RP0/CPU0:Apr 12 07:23:40.920 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/CPU0, state: PRESENT
RP/0/RP0/CPU0:Apr 12 07:23:51.232 : shelfmgr[333]: %PLATFORM-MBIMGR-7-IMAGE_VALIDATED : 0/0/SP: MBI bootflash:mbis/hfr-os-mbi-3.3.1.CSCek61756-1.0.0/cfc2413f7ad0e7e65a1c7f12c0?f7aec4/mbihfr-sp.vm? validated
RP/0/RP0/CPU0:Apr 12 07:23:52.085 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: MBI-BOOTING
RP/0/RP0/CPU0:Apr 12 07:24:08.049 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: MBI-RUNNING
RP/0/RP0/CPU0:Apr 12 07:24:25.426 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: IOS XR RUN
SP/0/0/SP:Apr 12 07:24:06.014 : init[65541]: %OS-INIT-7-MBI_STARTED : total time 8.478 seconds
SP/0/0/SP:Apr 12 07:24:16.489 : sysmgr[73]: %OS-SYSMGR-5-NOTICE : Card is COLD started
SP/0/0/SP:Apr 12 07:24:18.712 : init[65541]: %OS-INIT-7-INSTALL_READY : total time 21.192 seconds
SP/0/0/SP:Apr 12 07:24:34.784 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:36.588 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:39.405 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 2) will be respawned in 5 seconds
SP/0/0/SP:Apr 12 07:24:39.399 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:45.363 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:45.406 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 3) will be respawned in 5 seconds
SP/0/0/SP:Apr 12 07:24:45.402 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:51.778 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:51.853 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 4) will be respawned in 5 seconds
SP/0/0/SP:Apr 12 07:24:51.849 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:57.662 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:57.714 : sysmgr[73]: %OS-SYSMGR-2-REBOOT : reboot required, process (envmon) reason (maximum restart attempts exceeded)
SP/0/0/SP:Apr 12 07:24:58.111 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(1) (jid 104) can not be restarted, entering slow-restart mode
SP/0/0/SP:Apr 12 07:24:58.118 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 5) will be respawned in 30 seconds
SP/0/0/SP:Apr 12 07:24:57.709 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:58.129 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon[104] (pid 69678) has not sent proc-ready within 45 seconds
SP/0/0/SP:Apr 12 07:24:58.614 : /pkg/bin/sysmgr_log[65585]: %OS-SYSMGR-4-CHECK_LOG : /pkg/bin/shutdown_debug_script invoked by sysmgr. Reason: (envmon) maximum restart attempts exceeded, Compressed output will be saved.

Workaround/Solution

Cisco recommends replacing the affected hardware (CRS1-SIP-800s).

Cisco recommends a fix-on-fail strategy for this problem.

Although an upgrade program had previously been provided to replace potentially affected but otherwise working product, the upgrade program is now over and Cisco will only replace product which has actually failed. The standard RMA process should be used to replace failed product.

As of approximately June 22, 2007, new products that were manufactured under Engineering Change Order (ECO) E090453 or deviation D091788 are free of this problem. Refer to How to Identify Hardware Levels below for instructions on how to view the version and deviation of in-service product.

Note: Products that fall within the serial number range listed above are NOT affected by this problem if they have deviation D091788 or ECO E090453 applied.

Note: SPAs that are installed in the CRS1-SIP-800 should be removed prior to returning any CRS1-SIP-800s to Cisco.

Note: Any RMAs due to this notice need to be coded as:

How To Identify Hardware Levels

Check the hardware level of the CRS1-SIP-800 boards by following these steps:

  1. Check the CRS1-SIP-800 TAN by using the show diag command below. If the TAN is 800-23819-05 or higher, the CRS1-SIP-800 is already upgraded and does NOT need replacing, and no further checks are necessary.
  2. Check the Deviation Number by using the show diag command below. If the Deviation number is 091788, the CRS1-SIP-800 is already upgraded and does NOT need replacing, and no further checks are necessary.
  3. If the TAN is 800-23819-04 or lower, check all suspect Serial Numbers using the url to the SN Validation tool. If this tool affirms that the CRS1-SIP-800s are affected, then replace the board(s) using the Cisco RMA procedure.

Sample show diag output (in admin mode) for identifying a CRS1-SIP-800 that needs to be replaced:

PLIM 0/7/CPU0 : Cisco Carrier Routing System SPA Interface Processor Card
MAIN: board type 580070
800-23819-03 rev D0
dev 086877

S/N SAD--------
PCA: 73-8982-06 rev D0
PID: CRS1-SIP-800
VID: V01
CLEI: COUIAAMCAA
ECI: 134912
Board State : IOS XR RUN
PLD: Motherboard: 0x0025, Processor: 0xda13, Power: N/A
MONLIB: QNXFFS Monlib Version 3.1
ROMMON: Version 1.45(20070517:152402) [CRS-1 ROMMON]
Interface port config: 0 Ports
Optical reach type: Unknown
Connector type: MT-P


Sample show diag output (in admin mode) for identifying a CRS1-SIP-800 that does not need to be replaced:

PLIM 0/7/CPU0 : Cisco Carrier Routing System SPA Interface Processor Card
MAIN: board type 580070
800-23819-05 rev A0
dev 091788

S/N SAD--------
PCA: 73-8982-06 rev D0
PID: CRS1-SIP-800
VID: V01
CLEI: COUIAAMCAA
ECI: 134912
Board State : IOS XR RUN
PLD: Motherboard: 0x0025, Processor: 0xda13, Power: N/A
MONLIB: QNXFFS Monlib Version 3.1
ROMMON: Version 1.45(20070517:152402) [CRS-1 ROMMON]
Interface port config: 0 Ports
Optical reach type: Unknown
Connector type: MT-P

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.