Guest

Cisco IOS Software Releases 12.3 Mainline

QA Error Recovery for the Cisco 7500 Series

Table Of Contents

QA Error Recovery for the Cisco 7500 Series

Contents

Prerequisites for QA Error Recovery for the Cisco 7500 Series

Restrictions for QA Error Recovery for the Cisco 7500 Series

Information About QA Error Recovery for the Cisco 7500 Series

QA Error Recovery for the Cisco 7500 Series Routers

How to Configure QA Error Recovery for the Cisco 7500 Series

Displaying QA Error Recovery Information

Disabling QA Error Recovery

Configuration Examples for QA Error Recovery for the Cisco 7500 Series

QA Error Recovery Messages: Example

Disabling QA Error Recovery: Example

Additional References

Related Documents

Standards

MIBs

RFCs

Technical Assistance

Command Reference

hw-module main-cpu qa error-recovery

Glossary


QA Error Recovery for the Cisco 7500 Series


When a QA error condition is triggered on a Cisco 7500 series router with a Route Switch Processor (RSP), a cbus complex is initiated and the line cards are reloaded. A QA error recovery mechanism has been created that reduces the downtime from about 300 seconds to less than 1 second when a duplicate buffer header is detected in more than one queue. The QA error condition is specific to the Cisco 7500 series routers.

Feature History for the QA Error Recovery for the Cisco 7500 Series Feature

Release
Modification

12.1(19)E

This feature was introduced.

12.0(24)S1

This feature was integrated into Cisco IOS Release 12.0(24)S1.

12.2(15)T5

This feature was integrated into Cisco IOS Release 12.2(15)T5.

12.2(18)S

This feature was integrated into Cisco IOS Release 12.2(18)S.

12.0(26)S

This feature was integrated into Cisco IOS Release 12.0(26)S.

12.3(6)

This feature was integrated into Cisco IOS Release 12.3(6).

12.3(7)T

This feature was integrated into Cisco IOS Release 12.3(7)T.


Finding Support Information for Platforms and Cisco IOS Software Images

Use Cisco Feature Navigator to find information about platform support and Cisco IOS software image support. Access Cisco Feature Navigator at http://www.cisco.com/go/fn. You must have an account on Cisco.com. If you do not have an account or have forgotten your username or password, click Cancel at the login dialog box and follow the instructions that appear.

Contents

Prerequisites for QA Error Recovery for the Cisco 7500 Series

Restrictions for QA Error Recovery for the Cisco 7500 Series

Information About QA Error Recovery for the Cisco 7500 Series

How to Configure QA Error Recovery for the Cisco 7500 Series

Configuration Examples for QA Error Recovery for the Cisco 7500 Series

Additional References

Command Reference

Glossary

Prerequisites for QA Error Recovery for the Cisco 7500 Series

An image that supports the QA error recovery mechanism must be running on the router.

If Cisco IOS Release 12.0(24)S1 is running, the QA error recovery mechanism must be explicitly enabled.

Restrictions for QA Error Recovery for the Cisco 7500 Series

After three QA error conditions caused by duplicate queued buffer headers occur, the cbus complex is initiated and the line cards reload. After the line card reload, recovery is possible for another three QA errors. If an event occurs that triggers a memd recarve, such as a change in the maximum transmission unit (mtu), QA error recovery can handle a maximum of three QA errors after the memd recarve. Other QA error conditions, such as a null buffer header on any queue, can occur. Recovery is not possible in these cases, and the QA error triggers a cbus complex and subsequent line-card reloads.

Information About QA Error Recovery for the Cisco 7500 Series

This section contains information to help you understand the QA error recovery mechanism:

QA Error Recovery for the Cisco 7500 Series Routers

QA Error Recovery for the Cisco 7500 Series Routers

QA errors are sometimes seen in heavy traffic situations and may indicate a hardware failure or a software bug. In the case of a hardware failure, a Versatile Interface Processor (VIP) or a Route Switch Processor (RSP) must be replaced. It is possible, however, to recover from a QA error and not see another error for months. When the same buffer header is present in two different queues, the QA ASIC goes into an error condition and triggers a QA error interrupt. The QA error interrupt causes the RSP to dump the QA diagnostics and perform a cbus complex during which all the line cards are reloaded. Although the duplicate buffer header condition does not always indicate a hardware failure, the downtime of up to 300 seconds creates a real problem in the network.

The hw-module main-cpu qa error-recovery command has been created to enable a recovery mechanism for a QA error by allowing the router to remove the duplicate buffer header from all the queues that contain the buffer header and the buffer header is then counted as lost. The show controller cbus command displays the number of lost buffer headers and the number of QA error recoveries. By using the QA error recovery, the downtime is reduced to less than one second under lab conditions. Three QA errors caused by buffer headers are permitted before the router performs a cbus complex and reloads all the line cards.

How to Configure QA Error Recovery for the Cisco 7500 Series

This section contains the following tasks:

Displaying QA Error Recovery Information (optional)

Disabling QA Error Recovery (optional)

Displaying QA Error Recovery Information

Perform this optional task to display details about the QA error recovery mechanism.

SUMMARY STEPS

1. enable

2. show controllers cbus

DETAILED STEPS


Step 1 enable

Enables privileged EXEC mode. Enter your password if prompted.

Router> enable

Step 2 show controllers cbus

Use this command to view details about how many QA error recoveries occurred.

Router# show controllers cbus

MEMD at E0000000, 2097152 bytes (unused 1728, recarves 1962, lost/qaerror recoveries 0/1)

When a QA error occurs, the following extra logs are displayed as well as the normal logs. If the 
recovery is successful, a message is displayed reporting that the QA error recovery was successful.

.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Trying to recover from QA ERROR.
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Removing buffer header 0xE360 from all queues
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 155 on queue 0x2E
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x2E (48000170) has 154 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 1 on queue 0x340
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x340 (48001A00) has 0 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: At least one QA queue is broken
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Recovered from QA ERROR.

Disabling QA Error Recovery

QA error recovery is enabled by default on all supported images for Cisco 7500 series routers except for Cisco IOS Release 12.0(24)S1 on which QA error recovery is disabled by default. To enable the router to recover from a QA error when using a Cisco IOS Release 12.0(24)S1 image on a Cisco 7500 series router, use the hw-module main-cpu qa error-recovery command. Perform this optional task if you need to disable the QA error recovery mechanism.

SUMMARY STEPS

1. enable

2. configure terminal

3. no hw-module main-cpu qa error-recovery

4. exit

DETAILED STEPS

 
Command or Action
Purpose

Step 1 

enable

Example:

Router> enable

Enables privileged EXEC mode.

Enter your password if prompted.

Step 2 

configure terminal

Example:

Router# configure terminal

Enters global configuration mode.

Step 3 

no hw-module main-cpu qa error-recovery

Example:

Router(config)# no hw-module main-cpu qa error-recovery

Disables the QA error recovery mechanism on a Cisco 7500 series router.

Note The hw-module main-cpu qa error-recovery command is enabled by default on all supported releases except Cisco IOS Release 12.0(24)S1.

Step 4 

exit

Example:

Router(config)# exit

Exits global configuration mode and returns to privileged EXEC mode.

Configuration Examples for QA Error Recovery for the Cisco 7500 Series

This section contains the following examples:

QA Error Recovery Messages: Example

Disabling QA Error Recovery: Example

QA Error Recovery Messages: Example

In the following example, the QA error recovery mechanism is enabled on a Cisco 7500 series router running a Cisco IOS Release 12.3(6) image. The partial output shows some of the messages logged to the display when a duplicate buffer header is detected and the QA error recovery mechanism is enabled. In this example, the error recovery is successful.

.Feb  3 22:17:35 GMT-4: %RSP-2-QAERROR: reused or zero link error, write at addr 1A00 (QA)
log 221A0080, data E3600000 00000000
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Failed to enqueue buffer header 0xE360
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Approximate stack backtrace prior to interrupt:
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:
-Traceback= 6046CB8C 6046AEE4 6046AEB0 60625E78 602F887C 602F54D8 60010A24
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x28 (48000140) has 16 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x29 (48000148) has 8 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: No NULL terminator for queue 0x2A
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x2A (48000150) has 3080 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x2B (48000158) has 20 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x2C (48000160) has 16 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x2D (48000168) has 31 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 155 on queue 0x2E
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0x0000, element 156 on queue 0x2E is NULL
!
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: No NULL terminator for queue 0x341
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x341 (48001A08) has 2 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x351 (48001A88) has 3 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: At least one QA queue is broken
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Buffer header at 0x4000E360: 1D8120 2360188 0 1D8100
!
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Trying to recover from QA ERROR.
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Removing buffer header 0xE360 from all queues
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 155 on queue 0x2E
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x2E (48000170) has 154 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 1 on queue 0x340
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG:     Queue 0x340 (48001A00) has 0 elements
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: At least one QA queue is broken
.Feb  3 22:17:35 GMT-4: %QA-3-DIAG: Recovered from QA ERROR.

Disabling QA Error Recovery: Example

In the following example, the QA error recovery mechanism is disabled on a Cisco 7500 series router running a Cisco IOS Release 12.3(6) image:

Router# configure terminal
Router(config)# no hw-module main-cpu qa error-recovery
Router(config)# end

Additional References

The following sections provide references related to QA error recovery.

Related Documents

Related Topic
Document Title

Hardware management commands: complete command syntax, command mode, defaults, usage guidelines, and examples

Cisco IOS Interface Command Reference, Release 12.1

Cisco IOS Interface Command Reference, Release 12.2

Cisco IOS Interface and Hardware Component Command Reference, Release 12.3

Cisco IOS Interface and Hardware Component Command Reference, Release 12.3 T

Hardware management configuration examples

Cisco IOS Interface and Hardware Component Configuration Guide


Standards

Standards
Title

No new or modified standards are supported by this feature, and support for existing standards has not been modified by this feature.


MIBs

MIBs
MIBs Link

No new or modified MIBs are supported by this feature, and support for existing MIBs has not been modified by this feature.

To locate and download MIBs for selected platforms, Cisco IOS releases, and feature sets, use Cisco MIB Locator found at the following URL:

http://www.cisco.com/go/mibs


RFCs

RFCs
Title

No new or modified RFCs are supported by this feature, and support for existing RFCs has not been modified by this feature.


Technical Assistance

Description
Link

Technical Assistance Center (TAC) home page, containing 30,000 pages of searchable technical content, including links to products, technologies, solutions, technical tips, and tools. Registered Cisco.com users can log in from this page to access even more content.

http://www.cisco.com/public/support/tac/home.shtml


Command Reference

This section documents the following modified command only.

hw-module main-cpu qa error-recovery

hw-module main-cpu qa error-recovery

To enable the recovery mechanism for a QA error condition on a Cisco 7500 series router, use the hw-module main-cpu qa error-recovery command in global configuration mode. To disable the recovery mechanism for a QA error condition, use the no form of this command.

hw-module main-cpu qa error-recovery

no hw-module main-cpu qa error-recovery

Syntax Description

This command has no arguments or keywords.

Defaults

In Cisco IOS Release 12.0(24)S1, the recovery mechanism for a QA error condition is disabled; in all other releases, it is enabled.

Command Modes

Global configuration

Command History

Release
Modification

12.1(19)E

This command was introduced.

12.0(24)S1

This command was integrated into Cisco IOS Release 12.0(24)S1.

12.2(15)T5

This command was integrated into Cisco IOS Release 12.2(15)T5.

12.2(18)S

This command was integrated into Cisco IOS Release 12.2(18)S.

12.0(26)S

This command was integrated into Cisco IOS Release 12.0(26)S.

12.3(6)

This command was integrated into Cisco IOS Release 12.3(6).

12.3(7)T

This command was integrated into Cisco IOS Release 12.3(7)T.


Usage Guidelines

QA errors are sometimes seen in heavy traffic situations and may indicate a hardware failure or a software bug. In the case of a hardware failure, a Versatile Interface Processor (VIP) or a Route Switch Processor (RSP) must be replaced. It is possible, however, to recover from a QA error and not see another error for months. When the same buffer header is present in two different queues, the QA ASIC goes into an error condition and triggers a QA error interrupt. The QA error interrupt causes the RSP to dump the QA diagnostics and perform a cbus complex during which all the line cards are reloaded. Although the duplicate buffer header condition does not always indicate a hardware failure, the downtime of up to 300 seconds creates a real problem in the network.

The hw-module main-cpu qa error-recovery command has been created to enable a recovery mechanism for a QA error by allowing the router to remove the duplicate buffer header from all the queues that contain the buffer header and the buffer header is then counted as lost. The show controller cbus command displays the number of lost buffer headers and the number of QA error recoveries. By using the QA error recovery, the downtime is reduced to less than one second under lab conditions. Three QA errors caused by buffer headers are permitted before the router performs a cbus complex and reloads all the line cards.

After three QA error conditions caused by duplicate queued buffer headers occur, the cbus complex is initiated and the line cards reload. After the line card reload, recovery is possible for another three QA errors. If an event occurs that triggers a memd recarve, such as a change in the maximum transmission unit (mtu), QA error recovery can handle a maximum of three QA errors after the memd recarve. Other QA error conditions, such as a null buffer header on any queue, can occur. Recovery is not possible in these cases, and the QA error triggers a cbus complex and subsequent line-card reloads. The QA error condition is specific to the Cisco 7500 series routers.

Examples

The following example shows how to enable the QA error recovery mechanism when a Cisco IOS Release 12.0(24)S1 image is used on a Cisco 7500 series router. In all other supported releases, the QA error recovery mechanism is enabled by default.

Router(config)# hw-module main-cpu qa error-recovery

Related Commands

Command
Description

show controllers cbus

Displays information about the cBus controller card.


Glossary

RSP—Route Switch Processor. The Route Processor (RP) on Cisco 7500 series routers.

VIP—Versatile Interface Processor. Interface card used in Cisco 7500 series routers.


Note Refer to Internetworking Terms and Acronyms for terms not included in this glossary.