Guest

Cisco Network Modules

Field Notice: NM-CE's Running ACNS v5.0.16 and v5.1.10 or Earlier Are Susceptible to Uncorrectable ECC Errors and Data Address Mark Not Found Errors


Revised March 14, 2005

March 02, 2005


Products Affected

Product

Comments

NM-CE

Running ACNS versions earlier than 5.0.17 and 5.1.11

Problem Description

NM-CE's running versions of Cisco Application and Content Networking System (ACNS) prior to v5.0.17 and v5.1.11 are susceptible to the failure mode where the high number of load and unloads, disk head being moved on and off the platter would subsequently result in disk failures. Symtoms include ECC uncorrectable errors, Data address mark not found errors and so on.

Background

During normal operation, the reads as in the case of writes, originates from the user space and trickles down to the disk. The disk firmware gets the read request, positions the head over the requested sector, spins and seeks as necessary and reads the data. Then, it does ECC check, corrects if necessary, and sends the data back. If the data is uncorrectable, it reports ECC error to ACNS. If it encountered problems while reading the data, but the read data was verified to be ok, it may consider the sector to be in an impending failure state, and choose to map a new sector in its place. It is not clear at the moment, why the disks are running into this type of error.

The sectors that ran into ECC errors are not necessarily bad sectors or media. It just indicates that the data on the media some how got corrupted. Cisco was able to recover one of the NMs by writing into the same sector that was failing ECC. New data was written into the sector with a new ECC. We need to understand from the drive vendor why and under what conditions the data on the media could get corrupted. The drive logs from the NMs also indicated several WriteAbort Position and Writeabort P&V errors. More details on these two errors are being researched, but as we know now the errors could be caused by vibration or anything that would cause the heads to not align properly. Interestingly these errors were reported on the sectors that were closer to the sectors that failed ECC.

Problem Symptoms

NM-CE's running ACNS v5.0.16 and v5.1.11 or earlier may be susceptible a failure mode where the high number of load and unloads would subsequently result in disk failures. Symptoms could include ECC uncorrectable errors and Data address mark not found errors.

Symptom:

NM-CE may report disk errors over a period of time. The errors we have seen so far seem to indicate that the ECC Uncorrectable error is the prominent one.

Example error messages:

%CE-SYS-5-900001: <4>hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=68915219, sector=61603856
%CE-SYS-5-900001: <4>end_request: I/O error, dev 03:06 (hda), sector 61603856
%CE-SYS-5-900001: <4>hda: dma_intr: error=0x01 { AddrMarkNotFound }, LBAsect=68915219, sector=61603858

Condition: This problem may be observed on a NM-CE, running a version of ACNS that does not have the fix for this DDTs after several weeks or months.

Workaround/Solution

The workaround is to upgrade to ACNS Software that contains the Software fix. The fix has been integrated in ACNS 5.0.17 and later, 5.1.11 and later and 5.2.1 and all later versions.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered user and you must be logged in.

DDTS

Description

CSCef58506 (registered customers only)

NM-CE disk failures

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.