Cisco - Field Notice: FN - 63091 - ESR PRE2 Switchover Without Log Event - PCMCIA Flash Card Fault - Hardware - Fix on Failure
Revised June 11, 2008
May 06, 2008
THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Added DDTS references to DDTS and Workaround/Solution sections.
Initial Public Release
Any version may have the specific PCMCIA card loaded.
Any version may have the specific PCMCIA card loaded.
The Active PRE2 may reset under heavy read-write load, causing failover to the Standby PRE2 without a crash info file.
This will happen with PCMCIA Flash cards from a particular vendor lot when the customer frequently updates their configuration to the flash card. The resulting multiple sequential writes to the flash card can cause the card to fail to respond.
This condition is relatively rare and is only found in conditions of rapid, multiple write operations to the specified PCMCIA card lot.
This issue was originally thought to be a rare software issue. Subsequent engineering research determined that the root cause is multiple sequential writes, where an additional write starts before a previous operation concludes. This leads to a watchdog timer issue that in rare cases may crash the PRE2.
The PCMCIA flash vendor was unable to identify root cause of this issue or functional difference between the affected version of the card (SM9PC128M5SMM01) and the current flash devices sold under Cisco product ESR-PRE-MEM-FD128. The suspect card version is obsolete and no longer being produced.
A large carrier with a network design that utilizes frequent configuration updates of a very large file has encountered about a dozen such failures, dating back to June, 2007. This issue has not been reported in smaller networks with smaller sized configuration files.
Cisco was able to reproduce this failure only by continuous writes on the order of several thousand to tens of thousand of iterations before failure. This event is considered very rare for severity impact considerations.
The PRE2 is not at fault. This condition is caused by a slow write operation on a specific lot batch of externally accessible PCMCIA cards. (Cisco part number ESR-PRE-MEM-FD128)
The customer will observe that the former Standby PRE2 has become the Active PRE2. While this switchover is happening, network traffic will be interrupted. When using RPR+ as the configured HA mode, and with large configurations, this traffic interruption can be several minutes in duration causing loss of routes and sessions until they are re-established.
If the customer has a console on the PRE2, they will see a message as follows:
System Bootstrap, Version 12.0(20020314:211744) [REL-pulsar_sx.ios-rommon 112], DEVELOPMENT SOFTWARE
Copyright (c) 1994-2002 by cisco Systems, Inc.
Reset Reason Register = RESET_REASON_L2_WATCHDOG (0x3c)
C10000 platform with 1044480 Kbytes of main memory
rommon 2 >
The chance of encountering this problem will be reduced by minimizing the number of files stored on the PCMCIA Flash Card, and by minimizing the number of writes to any files that are stored there.
This can be done by storing very large configurations on a network file system rather than on the PCMCIA Flash Card.
Customers who have a PCMCIA flash card that exactly matches the identification section should RMA the PCMCIA Flash Card as a preventative measure.
There are no other PCMCIA Flash Cards identified that are subject to this issue. Other flash cards from this vendor and other vendors are not subject to this issue.
Note: The PRE2 should not be RMA'd as it is not causing the issue. The issue is specific to the write speed of the specific PCMCIA Flash Cards.
Prior to removing flash card, verify card is not in use due to risk of Cisco IOS crashing. Remove the flash card quickly, wait for a few seconds, and then reinsert the flash card quickly. DDTS References: CSCsi46184, CSCsi26038, and CSC98596.
To follow the bug ID link below and see detailed bug information, you must be a registered customer and you must be logged in.
How To Identify Hardware Levels
All known failure instances have occurred on PCMCIA Flash cards with the Cisco internal part number 16-2117-01 and matching vendor part number SM9PC128M5SMM01. These were supplied as Cisco part number ESR-PRE-MEM-FD128 (128 MB) external flash for the ESR platform.
Cisco testing was also able to cause the event on a 64MB PCMCIA card manufactured by the suspect vendor in the same time period. The Cisco part number is 16-2733-01 and the vendor part number for that is SM9FLAPC64M501. No customer reported cases are known on the 64 MB products. These were supplied as Cisco part number 10000-PREMEMFD64(64 MB) external flash for the ESR platform.
Note: All currently approved ESR-PRE-MEM-FD128 (128 MB) devices (vendor part numbers CPE128MCGS1MB00U and CPE064MCGS1MB01U) are not subject to this issue. Cisco was also unable to reproduce the problem with the latest, pre-qualified PCMCIA Flash card from the vendor, planned for future use on the ESR platform.
To emphasize this, customers have only reported problems with SM9PC128M5SMM01. We are including the 10000-PREMEMFD64(64 MB) version as a precaution as it can be made to fail in the lab.
For More Information
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
Receive Email Notification For New Field Notices
Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.