Guest

Cisco MGX 8800 Series Switches

CSCec49356 Anomaly Results in Card Failures on AXSM, PXM1E and PXM45 Running Any 3.0 or Early 4.0 Releases.


October 14, 2003



Products Affected

Sequential #

Random #

AXSM

All AXSM, AXSM/B, and AXSM-E models.

PXM1E

All PXM1E Models.

PXM45

All PXM45 Models.

 

Problem Description

A software anomaly results in failure of AXSM, PXM1E or PXM45 cards causing loss of services. The anomaly is dependent on the duration that an AXSM, PXM1E or PXM45 has been continuously functional, without being reset in either active or standby role. The anomaly surfaces if the card has been active for more than 248 days for AXSM and PXM1E or 497 days for the PXM45. If both the Active and Standby AXSM have been active for 248 days, both cards may fail and defeat 1:1 card-level Automatic Protection Switching (APS)or other redundancy scheme.

For purposes of this Field Notice, the term AXSM refers collectively to AXSM, AXSM/B and AXSM-E cards of all models. The term PXM1E refers to all models of PXM1E cards. The term PXM45 refers to all models of PXM45.

Note:This anomaly is not present in release 2.0 or 2.1. The anomaly was introduced in release 3.0 and is present in release 4.0.00 and 4.0.10 for the MGX Product Family.

Background

Necessary Conditions for problem to occur:

  1. AXSM or PXM1E has been operational, with out resets, for 248 days or more. For PXM45 series of cards, the threshold is 497 days.

  2. The MGX must be operating with MGX Software Release 3.0.00, 3.0.10, 3.0.20, 3.0.23, 4.0.00 or 4.0.10.

  3. An AXSM or PXM1E port with a link going up and down or a PXM45 switchover. Other triggers may also cause this anomaly to manifest itself.

Problem Symptoms

The tVsiSlave process crashes on AXSM and switches over to redundant AXSM if configured. On PXM1E or PXM45 card, the CMTask crashes but will not result in switch over to redundant PXM1E or PXM45. In the case of redundant pairs of AXSMs, if both the Active and the Standby have been operational for 248 or more days, both cards may fail and thus defeat any configured redundancy.

Network Maintenance Engineers will observe log entries indicating card failure as well as failure flags in the dspcds screen.

In the case of PXM1E, the NNI links on the PXM1E will fail and the MGX will cease passing traffic. As stated above, the PXM1E will not switch to redundant PXM1E.

In the case of the PXM45, when the CMTask crashes, the MGX may continue to pass traffic on established connections but new provisioning of connections or ports will be blocked. As with the PXM1E, the active PXM45 will not switch to the redundant PXM45.

Network users may drop sessions to remote resources and may not be able to access remote resources until the trouble is cleared.

Workaround/Solution

Preventitive Workaround:

This workaround must be implemented prior to occurrence of the problem. See Recovery section below if the problem has already occurred.

Check tickGet on the AXSM, PXM1E or PXM45 shellconn. If it is greater than 0x70000000, (0xF0000000 for PXM45) then reset the standby card and switchover; if redundancy is not enabled on the AXSM or PXM1E, schedule a maintenance outage period in order to reset the card. TickGet looks at a freerunning 32-bit counter used for developing timestamps.

Procedure for viewing tickGet:

  • cc to AXSM, PXM1E or PXM45

  • Type shellConn

  • Type tickGet

  • See the returned value:

    axsm1>tickGet
    tickGet
    value = 104812473 = 0x63f4fb9
    
  • If the hexadecimal value, in bold above, is 0x80000000 or greater for the AXSM or PXM1E, 0xF0000000 or greater for PXM45, then the card is immediately at risk. In a redundant pair, reset the standby AXSM, PXM1E or PXM45 card immediately.

  • cc back to the active PXM45 or PXM1E.

  • For AXSM cards, use the command resetcd Standby_AXSM_Slot# to reset the standby AXSM. For PXM cards, use the command resetcd standy_PXM_slot# .

  • When the standby AXSM or PXM1E returns to standby mode execute the command switchredcd fromSlot toSlot, for AXSM, or switchcc, for PXM, to activate the standby AXSM, PXM1E, PXM45 and reset the active AXSM, PXM1E or PXM45

  • If the AXSM, PXM1E or PXM45 in question is not part of a redundant pair, schedule an maintenance window as soon as possible to execute a card reset using the command resetcd AXSM_slot# or, for PXM1E resetcd. This will result in an outage during the reset card activity.

Additional Consideration:

It is possible that some MGX switches will be equipped with multiple AXSM cards. In these configurations, Cisco recommends that any Standby AXSM cards be reset first, during a maintenance window.

Recovery:

Reset the affected AXSM or PXM1E cards. This can be traffic-impacting in the non-redundant configuration. Use the command resetcd AXSM_slot# for the AXSM, or resetcd slot# for the PXM1E or PXM45.

Solution:

Upgrade system software to release 4.0.12 or later versions. No release for 3.0 with this fix is planned. Contact your Account Manager regarding lack of planned 3.0 release. Check the Software Center for availability of these releases. Login is required to download these software releases.

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered user and you must be logged in.

DDTS

Description

CSCec49356 (registered customers only)

tVsiSlave on AXSM crashes on multiple nodes around the same time

 

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods: