Guest

Cisco WAN Switching Modules

Field Notice #81-12: Model-F NTM[B] Trunk Card Issues on the IGX


December 31, 1999

Revised June 07, 2002


Problem Summary

There are four problems which may be experienced with the current hardware version (Rev. F) of the Model-F Network Trunk Module (NTM)[B] trunk card. These problems can also occur with earlier hardware versions of the card. Depending upon the circumstances, the Model-F NTM[B] trunk card may experience one or more of the following problems:

  • Resets caused by a routing table checksum failure.

  • Resets caused by the queue manager reading a 0xFF GCI.

  • Reports of timestamped packet drops, even though there is no timestamped data being carried on the associated trunk.

  • Bit errors, resulting in frame discards and other data errors.

Problem Symptoms

Problem #1 (Routing table checksum failure):

The Model-F NTM[B] experiences a routing table checksum error, causing the card to go into alarm and to fail with the following "0B" hardware error string: 0B 00 06 96 01 44. This error string can be identified using the dspcderrs <slot> command. Unless the feature has been disabled in software, upon hardware failure, the card will automatically reset.

Problem #2 (Queue manager reads a GCI of 0xFF):

The Model-F NTM[B] finds a GCI of 0xFF, causing the card to go into alarm and to fail with the following "0B" hardware error string: 0B 00 06 96 02 0A 00 07 F8. This error string can be identified using the dspcderrs <slot> command. Unless the feature has been disabled in software, upon hardware failure, the card will automatically reset.

Problem #3 (Abundant timestamped transmit packet drops):

The Model-F NTM[B] reports abundant TS (timestamped) transmit packet drops. There may not be any timestamped data on the associated trunk, or anywhere else in the network, when this activity is observed.

Problem #4 (Bit errors):

The Model-F NTM[B] causes bit errors, resulting in frame discards and other data errors.

For more information, see Problem Details.

Platforms Affected

IGX

Releases Affected

Hardware only: NTM[B] Model F, hardware revision F (and lower)

Releases Containing the Fix

NTM[B] Model F, hardware revision H (and later)

Problem Details

The problems previously described all have their root causes in the areas of the cellbus state machine (CSM), processor access and arbitration logic. Selected cases of processor access to the CSM were found to be asynchronous, which, in turn, caused glitches which could create SRAM timing and specification violations.

Problem #1 (Routing table checksum failure)

The NTM contains a routing table in SRAM, written by NTM firmware under switch software control, and read by a hardware state machine to make a packet acceptance decision. The checksum of the routing table is verified every two hours, which would be the worst-case scenario for the problem to occur. An error in the routing table could potentially occur when connections are added or removed from an NTM trunk (meaning the addition of new connections, reroutes of existing connections, card resets, and other events which cause connections to be added or removed from a trunk). When tested at two-hour intervals, an error in the routing table would produce a checksum failure and, in turn, the declaration of a card failure.

Experience thus far is that a trunk that has 100 connections added to it has approximately a 50% chance of experiencing such a failure. This is true when the IGX node of interest has at least four T1 trunks loaded to > 50% utilization. Frequency of failure goes down and up somewhat linearly with the load on the bus.

Problem #2 (Queue manager reads a GCI of 0xFF)

The GCI is originally stored in the NTM routing table. The hardware state machine reads the value, and stores it in FIFO for the queue manager. Firmware should not have stored the 0xFF value in the routing table. On occasion, the resulting problem has also been accompanied by a timestamp packet drop trunk error.

Experience thus far is that a trunk which has 100 connections added to it has approximately a 10% chance of experiencing such a failure. This is true when the IGX node of interest has at least four T1 trunks loaded to > 50% utilization. Frequency of failure goes down and up somewhat linearly with the load on the bus.

Problem #3 (Abundant timestamped packet drops)

Using the dsptrkerrs and dsptrkerrs <slot> commands, abundant timestamped transmit packet drops may be registered, even when there is no timestamped data on the associated trunk. This problem could potentially occur when connections are added or removed from an NTM trunk (meaning the addition of new connections, reroutes of existing connections, card resets and other events which cause connections to be added or removed from a trunk).

Experience thus far is that a trunk which has 100 connections added to it has approximately a 50% chance of experiencing such a failure. This is true when the IGX node of interest has at least four T1 trunks loaded to > 50% utilization. Frequency of failure goes down and up somewhat linearly with the load on the bus. Under the conditions described, the number of timestamped packet drops reported is in the range of 100k per minute.

Problem #4 (Bit errors)

Packet lines based upon Model-F NTM[B] trunk cards may introduce bit errors, causing bit errors on data circuits, egress frame discards on frame relay ports, or other data circuit errors. These errors are most likely to occur while trunk is being configured, such as during massive reroute activity. The bit error rate observed has been as high as 10E-4, and approaches zero when no network configuration is taking place.

Workarounds

No switch software or firmware workarounds for the hardware problems are currently available.

Identifying Model-F NTM[B] Trunk Cards

There are two methods to identify Model-F NTM[B]s and their hardware revisions:

  1. From a physical point of view, the Model-F NTM[B], which is a "native" IGX Card, does not have an Adapter Card Module (ACM) attached to the main printed circuit board assembly (PCBA). The Model-F NTM[B] part number, located near the bottom of the PCBA, just behind the faceplate, will be 214400-xx. Just to the right of the part number, the hardware revision will be displayed.

  2. If the NTM in question is installed in an IGX, perform a dspcd <slot> command for the NTM. A Model-F NTM[B] card will have a Model F identifier: "Fxx". The hardware revision of the card will be the middle character identifier; for example, a Model-F card at hardware revision E will show up as FEx. The last character is the NTM's firmware revision.

Recommendations

Upgrade the Model-F NTM[B] card to hardware revision H. This activity will result in the changing of an electrically programmable logic device (EPLD) on the card. Replacement of this soldered, 100-pin device requires a return of the associated Model-F NTM[B] to Cisco for hardware upgrade. Reprogramming of the existing EPLD cannot be accomplished through a remote reprogramming mechanism. The EPLD changes make all processor accesses to the CSM synchronous, resolving the issues with SRAM timing and specification violations.

Upgrading Model-F NTM[B] Trunk Cards

Cisco will upgrade the Model-F NTM[B] trunk card with both NTM Model-Es and NTM [B] Model-Fs; these boards are functionally equivalent. For most customers, a delay in upgrading their Model-F NTM [B]s may not matter because the Model-F NTM [B] reset problems are rare.

Customers should ship back to Cisco their down-rev Model-F NTM [B]s within two weeks of receiving the upgraded Model-F NTM[B]s.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.