Guest

Cisco 12000 Series Routers

Cisco 12000 Single Event Upset Failures Overview and Work Around Summary


August 15, 2003


Problem Description

Cisco 12000 line cards may reset after single event upset (SEU) failures. This field notice highlights some of those failures, why they occur, and what work arounds are available.

Background

Unlike hard errors, soft errors are spontaneous, non-reoccurring or transient, and non-reproducible. The error is called "soft" because:

  • The device functions normally after data is restored.

  • The transient error is present in data stored in memory devices on line cards.

  • The error is caused by system noise or by ionizing radiation.

SEU failures are often caused by the following:

  • Alpha particles emitted by radioactive packaging and wafer processing materials on synchronous random-access memory (SRAM) and dynamic random-access memory (DRAM) products.

  • Thermal neutron from cosmic radiation of energy less then 15ev.

  • Terrestrial high energy cosmic particles, neutrons, protons, pions and muons.

The chance for single event upset (SEU) failures in memory devices increases as densities rise and core voltages drop.

IOS performs error recovery which is the ability to detect soft errors and ensure they don't adversely affect product performance. The methods used by IOS on Cisco 12000 include:

  • ECC (Error Correction Code)

  • Replacement from backup data sources.

  • Hitless switchover to redundant line cards.

Problem Symptoms

Cards are showing memory parity errors or application-specific integrated circuit (ASIC) errors which may have resulted in a card reload with a two to three minute recovery. Data is passing normally after the card reloaded.

Workaround/Solution

The Cisco IOS® Software Release 12.0(25)S and later include several SEU error recovery improvements for the Cisco 12000 series.

IOS releases 12.0(21)S6, 12.0(22)S4, 12.0(23)S2, 12.0(21)S1 and later include SEU failure fixes for Cisco 12000 Engine 3 based line cards. These improvements reduce the chance of card reload due to SEU failures, reduce reload time if it occurs, and provides better text messaging for the failure types.

For customers using Engine 3, 4, or 4+ based line cards, these IOS improvements have significantly reduced error recovery time to under three seconds.

Note: Customers should not replace hardware after a single SEU failure. The linecard should be monitored for further instances. If additional failures occur, contact Cisco Technical Support.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered user and you must be logged in.

DDTS

Description

CSCea34650 (registered customers only)

ISE: Parity Error recovery time

CSCea35822 (registered customers only)

ISE: Line card crash during parity error injection

CSCea35881 (registered customers only)

ISE: No consistancy between Tx and Rx SRAM64 fault injection

CSCeb13025 (registered customers only)

ISE: Alpha error results in card crash

CSCea57600 (registered customers only)

ISE: Line card cpu hit high utilization with low pps packet punt to cpu

Cisco IOS Versions Affected

Cisco IOS Software Releases earlier than 12.0(25)S for the 12000 series are more likely to have SEU error recovery problems.

How To Upgrade Software

To download IOS for upgrade of your Cisco 12000 go to the Cisco Software Center on Cisco.com.

Additional Information About SEU

For additional information about Single Event Upset (SEU) failures, you can go to one of the following Cisco on-line documents:

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.