Guest

Cisco Nexus 7000 Series Switches

Field Notice: FN - 63751 - Nexus 7000 - Products Affected Might Fail to Boot Up After a Software Upgrade or Power Cycle - Fix on Failure

Field Notice: FN - 63751 - Nexus 7000 - Products Affected Might Fail to Boot Up After a Software Upgrade or Power Cycle – Fix on Failure

Revised September 4, 2014
March 3, 2014


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.2
04-SEP-2014
Updated the Problem Symptoms and Workaround/Solution Sections
1.1
05-MAR-2014
Updated Products Affected
1.0
03-MAR-2014
Initial Public Release

Products Affected

Products Affected
N7K-M148GT-11
N7K-M132XP-12 

Problem Description

The Nexus 7000 linecards (listed in the Products Affected section) might fail to boot up after a software upgrade or other user action where the board requires a power cycle operation.

Background

Cisco has been working with some customers on an issue related to memory components manufactured by a single supplier between 2005 and 2010. These memory components are widely used across the industry and are included in a number of Cisco products.

Although the majority of Cisco products using these components are experiencing field failure rates below expected levels, some components may fail earlier than anticipated. A handful of our customers have recently experienced a higher number of failures, leading us to change our approach to managing this issue.

While other vendors have chosen to address this issue in different ways, Cisco believes its approach is the best course of action for its customers. Despite the cost, we are demonstrating that we always make customer satisfaction a top priority. Customers can learn more about this topic at Memory Component Issue web page.

PLEASE NOTE - The products listed in this Field Notice have lower than expected failure rates. This assessment is based on actual usage of affected memory components, observed field failure rates and product replacements since 2012.

A degraded component will not affect the ongoing operation of a device, but will be exposed by a subsequent power cycle event. This event will result in a hard failure of the device, which cannot be recovered by a reboot or additional power cycle. For these reasons, additional caution is recommended for operational activities requiring the simultaneous power cycling of multiple devices. This issue has been observed most commonly on devices that have been in service for 24 months or more.

Problem Symptoms

If the Nexus 7000 linecard (listed in the Products Affected section) has been in continuous operation for approximately 24 months, then the Nexus 7000 hardware might fail to boot up due to memory failure during a power cycle event. This is caused by one or more of these actions:

  • Upgrade the software
  • Reload the entire product
  • Reload after installation
  • Online Insertion Removal/Replacement (OIR)

Note: This issue does not affect boards while the boards are in operation. The board failure might occur after one or more of the actions listed are executed.

The card symptoms observed are shown here.The goal is to identify any error messages that differentiate memory component causing boot up failures from other boot up failures. If there is no error message, then no additional information is necessary.

Supervisor console messages are shown here:

2013 Nov 20 08:29:15 switch %$ VDC-1 %$ %PLATFORM-2-MOD_PWRUP: Module 1 
powered up (Serial number JAFxxxxxxx)
2013 Nov 20 08:29:15 switch %$ VDC-1 %$ %MODULE-2-MOD_FAIL: Initialization of
module 1 (serial: JAFxxxxxxx) failed
2013 Nov 20 08:29:16 switch %$ VDC-1 %$ %MODULE-2-MOD_FAIL: Initialization of
module 1 (serial: JAFxxxxxxx) failed
2013 Nov 20 08:29:16 switch %$ VDC-1 %$ %PLATFORM-2-MOD_PWRDN: Module 1
powered down (Serial number JAFxxxxxxx)

switch# show mod Mod Ports Module-Type                      Model              Status
--- ----- -------------------------------- ------------------ ------------
1   0     10 Gbps Ethernet Module                             powered-dn
5   0     Supervisor module-1X             N7K-SUP1           active *

Mod Power-Status Reason
--- ------------ ---------------------------
1   powered-dn   Reset (powered-down) because module does not boot

This failure message can appear for other failures as well. Use the output of the show module internal activity module command in order to verify the failure is memory related.

The output of the show module internal activity module command has POST codes. A 0xE2 value of PWR_MGMT_LCP_STATUS_REG indicates memory failure. An example is shown here:

switch# show module internal activity module x 
****************** module activity log ***************
1) At 999305 usecs after Thu Feb 20 07:22:55 2014
DEBUG: Non-MTS event
category_id(2), event_id(104), resource_id(1794)

2) At 999017 usecs after Thu Feb 20 07:22:55 2014
Holding current transaction
OP: MTS_OPC_PFM_MODULE_POWER_STATUS

- - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - -

6) At 998880 usecs after Thu Feb 20 07:22:55 2014
Received MTS_OPC_PFM_MODULE_POWER_STATUS from node 1281 Platform manager
Power status: Powered down
- - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - -

11) At 518989 usecs after Thu Feb 20 07:22:55 2014
PWR_MGMT_SCRATCH_A_REG(0x1) = 0xdb, PWR_MGMT_POST_CODE_REG(0xb) = 0xe2

12) At 518914 usecs after Thu Feb 20 07:22:55 2014
PWR_MGMT_REVISION_REG = 0x6e, PWR_MGMT_IO_CNTRL_REG = 0xc4
PWR_MGMT_TRIPLE_IO_STATUS_REG1 = 0x6e, PWR_MGMT_LCP_STATUS_REG=0xe2

Workaround/Solution

Fix on Failure Replacement Guidelines: Request RMA product through normal service support channels.

If you need assistance in order to determine which hardware part(s) might need replacement, consult the error messages documented in the Problem Symptom section.

For assistance with replacement part disposition, reference this table. In cases where replacing the memory DIMM and/or daughter card is not a viable option, a request may be made to replace the entire card.

Refer to this documentation for assistance on memory modules replacement:

Affected PID Action Replacement PID
Quantity
N7K-M148GT-11
Replace entire card or baseboard Memory Module
N7K-M148GT-11 or
N7K-DIMM-1GB=
1
N7K-M132XP-12
Replace entire card
N7K-M132XP-12
1

How To Identify Hardware Levels

Enter the show inventory command in order to obtain the Product ID (PID) of the Nexus 7000 linecard. If the CLI is not available, physically inspect the linecard in order to locate the PID.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.