Guest

Cisco MDS 9100 Series Multilayer Fabric Switches

Field Notice: FN - 63752 - MDS 9000 Might Fail to Boot After a Software Upgrade or Power Cycle - Fix on Failure

Field Notice: FN - 63752 - MDS 9000 Might Fail to Boot After a Software Upgrade or Power Cycle - Fix on Failure

Revised September 4, 2014
March 3, 2014


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.1
04-SEP-2014
Updated the Workaround/Solution Section
1.0
03-MAR-2014
Initial Public Release

Products Affected

Products Affected
DS-C9124-K9 
DS-X9112
DS-X9124
DS-X9148
DS-X9530-SF2-K9
DS-X9530-SF2AK9
DS-HP-FC-K9
DS-IBM-FC-K9

Problem Description

The Cisco MDS 9000 Series switches (listed in the Products Affected section) might fail to boot up after a software upgrade or other user action where the board requires a power cycle operation.

Background

Cisco has been working with some customers on an issue related to memory components manufactured by a single supplier between 2005 and 2010. These memory components are widely used across the industry and are included in a number of Cisco products. 

Although the majority of Cisco products using these components are experiencing field failure rates below expected levels, some components may fail earlier than anticipated. A handful of our customers have recently experienced a higher number of failures, leading us to change our approach to managing this issue. 

While other vendors have chosen to address this issue in different ways, Cisco believes its approach is the best course of action for its customers. Despite the cost, we are demonstrating that we always make customer satisfaction a top priority. Customers can learn more about this topic at Memory Component Issue web page.

PLEASE NOTE - The products listed in this Field Notice have lower than expected failure rates. This assessment is based on actual usage of affected memory components, observed field failure rates and product replacements since 2012.

A degraded component will not affect the ongoing operation of a device, but will be exposed by a subsequent power cycle event. This event will result in a hard failure of the device, which cannot be recovered by a reboot or additional power cycle. For these reasons, additional caution is recommended for operational activities requiring the simultaneous power cycling of multiple devices. This issue has been observed most commonly on devices that have been in service for 24 months or more.

Problem Symptoms

If the suspected hardware (listed in the Products Affected section) has been in operation for approximately 24 months, then the hardware might fail to boot up due to memory failure during a power cycle event. This is caused by one or more of these actions:

  • Upgrade the software
  • Reload the entire product
  • Reload after installation
  • Online Insertion Removal/Replacement (OIR)

Note: This issue does not affect boards while the boards are in operation. The board failure might occur after one or more of the actions listed are executed. An In-Service Software Upgrade (ISSU) does not cause a loss of power to the module.

The problem symptoms include these messages that might be observed on the Supervisor console:

2012 Apr 14 21:57:22 mds9506 %PLATFORM-2-MOD_DETECT:
 Module 1 detected Module-Type 1/2/4 Gbps FC Module Model DS -X9148  2012 Apr 14 21:57:22 mds9506 %PLATFORM-2-MOD_PWRUP:
 Module 1 powered up  2012 Apr 14 22:00:24 mds9506 %MODULE-2-MOD_FAIL:
 Initialization of module 1 failed  2012 Apr 14 22:00:25 mds9506 %MODULE-2-MOD_FAIL:
 Initialization of module 1 failed 
2012 Apr 14 22:00:25 mds9506 %PLATFORM-2-MOD_PWRDN:
 Module 1 powered dow

switch# show mod 1 
Mod Ports Module-Type Model Status
--- ----- --------------------- -------- -----------
1 48 1/2/4 Gbps FC Module DS-X9148 powered-dn

Mod Power-Status Reason 
--- ------------ --------------------------- 
1 powered-dn Reset (powered-down) because module does not boot

This failure message might appear due to other non-related failures. Use the output from the show module internal activity module command in order to further validate a memory failure.

The output from this command includes POST codes. A 0xE2 value of PWR_MGMT_LCP_STATUS_REG indicates a memory failure. Here is an example:

switch# show module internal activity module x
****************** module activity log *************** 
1) At 999305 usecs after Thu Feb 20 07:22:55 2014 
DEBUG: Non-MTS event 
category_id(2), event_id(104), resource_id(1794) 

2) At 999017 usecs after Thu Feb 20 07:22:55 2014 
Holding current transaction 
OP: MTS_OPC_PFM_MODULE_POWER_STATUS 
- - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - 
6) At 998880 usecs after Thu Feb 20 07:22:55 2014 
Received MTS_OPC_PFM_MODULE_POWER_STATUS from node 1281 Platform manager 
Power status: Powered down 
- - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - - - 
11) At 518989 usecs after Thu Feb 20 07:22:55 2014 
PWR_MGMT_SCRATCH_A_REG(0x1) = 0xdb, PWR_MGMT_POST_CODE_REG(0xb) = 0xe2 
12) At 518914 usecs after Thu Feb 20 07:22:55 2014 
PWR_MGMT_REVISION_REG = 0x6e, PWR_MGMT_IO_CNTRL_REG = 0xc4 
PWR_MGMT_TRIPLE_IO_STATUS_REG1 = 0x6e, w 

Workaround/Solution

Fix on Failure Replacement Guidelines: Request RMA product through normal service support channels.

If you need assistance in order to determine which hardware part(s) might need replacement, consult the error messages documented in the Problem Symptom section.

For assistance with replacement part disposition, reference this table.

Affected PID
Action
Replacement PID
Quantity
DS-C9124-K9 or
DS-C9124AP-K9
Replace entire unit
DS-C9124-K9 or
DS-C9124AP-K9
1
DS-X9112
Replace entire card
DS-X9112
1
DS-X9124
Replace entire card
DS-X9124
1
DS-X9148
Replace entire card
DS-X9148
1
DS-X9530-SF2-K9
Replace entire card
DS-X9530-SF2-K9
1
DS-X9530-SF2AK9
Replace entire card
DS-X9530-SF2AK9
1
DS-HP-FC-K9 or
DS-HP-FC-LIC-K9
Replace entire card
Customer should work with HP support for information
1
DS-IBM-FC-K9 or
DS-IBM-FC-LIC-K9
Replace entire card
Customer should work with IBM support for information
1

How To Identify Hardware Levels

Enter the show inventory command in order to obtain the Product ID (PID). If the CLI is not available, physically inspect in order to locate the PID.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.