Guest

Cisco MDS 9500 Series Multilayer Directors

Field Notice: FN - 63133 - MDS9000 - Generation 1 Linecards May Reload Unexpectedly When They Fail the OHMS Bootflash Test - Issue is Only Applicable to MDS Generation 1 Linecards Running SAN OS Code Prior to 3.3(1c)

Field Notice: FN - 63133 - MDS9000 - Generation 1 Linecards May Reload Unexpectedly When They Fail the OHMS Bootflash Test - Issue is Only Applicable to MDS Generation 1 Linecards Running SAN OS Code Prior to 3.3(1c)

Revised July 16, 2008

July 03, 2008


NOTICE:

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision Date Comment
1.1
16-JUL-2008

Problem description updated, Workaround / Solution Updated

1.0
03-JUL-2008

Initial Public Release

Products Affected

Products Affected
MDS9000 - DS-X9016
MDS9000 - DS-X9032
MDS9000 - DS-X9032-SMV
MDS9000 - DS-X9032-SSM
MDS9000 - DS-X9302-14K9
MDS9000 - DS-X9304-SMIP
MDS9000 - DS-X9308-SMIP

Problem Description

In rare cases, MDS Generation 1 linecards may fail the periodic Online Health Management System (OHMS) bootflash test. If the test fails while the linecard is in production and OHMS is configured to reload the linecard, user traffic through the linecard may be disrupted.

This issue is only applicable to MDS Generation 1 linecards running SAN OS code prior to 3.3(1c).

Background

By default, the OHMS tests the bootflash in all modules. If this test fails, the OHMS failure action is to generate a callhome message, an exception log entry and reload the affected linecard immediately.

Since its release, OHMS bootflash test failure action has included reloading an affected linecard. This action has been changed to not reload the linecard starting from SAN OS 3.3(1c).

Problem Symptoms

show logging log
%SYSTEMHEALTH-2-OHMS_BOOTFLASH_FAILED: Bootflash test maximum failures reached for module 3 Reason (1).

%MODULE-2-MOD_DIAG_FAIL: Module 3 (serial: ABC12345678) reported failure on ports 3/1-3/16 (Fibre Channel) due to System Health failure in device 45 (device error 0xc2d00101)

show module internal exceptionlog module 3
********* Exception info for module 3 ********

exception information --- exception instance 1 ----
Module Slot Number: 3
Device Id : 45
Device Name : Hard disk
Device Errorcode : 0xc2d00101
Device ID : 45 (0x2d)
Device Instance : 00 (0x00)
Dev Type (HW/SW) : 01 (0x01)
ErrNum (devInfo) : 01 (0x01)
System Errorcode : 0x4073002a System Health failure
Error Type : FATAL error
PhyPortLayer : Fibre Channel
Port(s) Affected : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
Error Description :
DSAP : 0 (0x0)
UUID : 0 (0x0)
Time : Thu May 29 00:11:28 2008
(Ticks: 483D6810 jiffies)


show system reset-reason module 3
*************** module reset reason (3) *************
Time stamp : At 627864 usecs after Thu May 29 00:11:29 2008

Service name : OHMS daemon
Reset reason : Runtime diagnostic failure => [Failures < MAX] : powercycle
Serial number: ABC12345678
Error code : NA

Workaround/Solution

This issue is resolved in SAN OS 3.3(1c) and higher. Customers should contact their Reseller/OSM to identify a qualified version of SAN OS for their environment.

Customers who are not planning to upgrade to a fixed release of SAN OS can perform the workaround for this issue by disabling the OHMS bootflash failure action for each Generation 1 linecard in their switch using the procedure in Section A.

The following behavior occurs when the bootflash failure action for a linecard has been disabled:

1. The OHMS will not reload the line card if the bootflash test on the linecard fails.

2. The linecard will continue to operate normally even when it has a bootflash issue.

3. Callhome alerts (if configured) and exception logging of this error will be displayed.

4. Syslog messages for this error will continue to be generated. 

Note: If the OHMS bootflash failure action has been manually disabled, this configuration must be reversed after upgrading to a fixed release. To re-enable callhome alerts and exception logging of this error follow the procedure listed in Section B.

Customers who disable the OHMS bootflash failure action must follow the bootflash verification procedures detailed in Cisco Field Notice 63099 prior to performing a non-disruptive software upgrade or downgrade to the switch.

It is not recommended to disable the OHMS testing completely. As long as the OHMS is still configured to run the bootflash tests, bootflash test syslog notifications and test pass/fail statistics will continue to be generated.

Note: If a linecard is replaced with another linecard of the same model then the OHMS configuration for that slot is preserved. If the new linecard is a different model, the OHMS configuration for that slot is purged and will need to be reapplied. To check current OHMS configuration, follow the procedure listed in Section C.

--------------------------------------------------------------------------------

Section A.
Steps to disable the OHMS bootflash recovery action for module "x" (replace "x" with the actual module number):

switch# conf t
switch(config)# no system health module x bootflash failure-action
switch(config)# exit
switch# copy running startup

--------------------------------------------------------------------------------

Section B.
Steps to enable the OHMS bootflash recovery action for module "x" (replace "x" with the actual module number):

switch# conf t
switch(config)# system health module x bootflash failure-action
switch(config)# exit
switch# copy running startup

--------------------------------------------------------------------------------

Section C.
Steps to check current OHMS configuration for module "x" (replace "x" with the actual module number):

switch# show system health module x

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered customer and you must be logged in.

DDTS Description
CSCsq55164 (registered customers only) change ohms default bootflash failure recovery action to 'no action'
CSCsq53897 (registered customers only) ohms may reset a generation 1 linecard with dev error 0xc2d00101

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.