Table Of Contents
QA Error Recovery for the Cisco 7500 Series
Prerequisites for QA Error Recovery for the Cisco 7500 Series
Restrictions for QA Error Recovery for the Cisco 7500 Series
Information About QA Error Recovery for the Cisco 7500 Series
QA Error Recovery for the Cisco 7500 Series Routers
How to Configure QA Error Recovery for the Cisco 7500 Series
Displaying QA Error Recovery Information
Configuration Examples for QA Error Recovery for the Cisco 7500 Series
QA Error Recovery Messages: Example
Disabling QA Error Recovery: Example
hw-module main-cpu qa error-recovery
QA Error Recovery for the Cisco 7500 Series
When a QA error condition is triggered on a Cisco 7500 series router with a Route Switch Processor (RSP), a cbus complex is initiated and the line cards are reloaded. A QA error recovery mechanism has been created that reduces the downtime from about 300 seconds to less than 1 second when a duplicate buffer header is detected in more than one queue. The QA error condition is specific to the Cisco 7500 series routers.
Feature History for the QA Error Recovery for the Cisco 7500 Series Feature
Finding Support Information for Platforms and Cisco IOS Software Images
Use Cisco Feature Navigator to find information about platform support and Cisco IOS software image support. Access Cisco Feature Navigator at http://www.cisco.com/go/fn. You must have an account on Cisco.com. If you do not have an account or have forgotten your username or password, click Cancel at the login dialog box and follow the instructions that appear.
Contents
•
Prerequisites for QA Error Recovery for the Cisco 7500 Series
•
Restrictions for QA Error Recovery for the Cisco 7500 Series
•
Information About QA Error Recovery for the Cisco 7500 Series
•
How to Configure QA Error Recovery for the Cisco 7500 Series
•
Configuration Examples for QA Error Recovery for the Cisco 7500 Series
Prerequisites for QA Error Recovery for the Cisco 7500 Series
•
An image that supports the QA error recovery mechanism must be running on the router.
•
If Cisco IOS Release 12.0(24)S1 is running, the QA error recovery mechanism must be explicitly enabled.
Restrictions for QA Error Recovery for the Cisco 7500 Series
After three QA error conditions caused by duplicate queued buffer headers occur, the cbus complex is initiated and the line cards reload. After the line card reload, recovery is possible for another three QA errors. If an event occurs that triggers a memd recarve, such as a change in the maximum transmission unit (mtu), QA error recovery can handle a maximum of three QA errors after the memd recarve. Other QA error conditions, such as a null buffer header on any queue, can occur. Recovery is not possible in these cases, and the QA error triggers a cbus complex and subsequent line-card reloads.
Information About QA Error Recovery for the Cisco 7500 Series
This section contains information to help you understand the QA error recovery mechanism:
•
QA Error Recovery for the Cisco 7500 Series Routers
QA Error Recovery for the Cisco 7500 Series Routers
QA errors are sometimes seen in heavy traffic situations and may indicate a hardware failure or a software bug. In the case of a hardware failure, a Versatile Interface Processor (VIP) or a Route Switch Processor (RSP) must be replaced. It is possible, however, to recover from a QA error and not see another error for months. When the same buffer header is present in two different queues, the QA ASIC goes into an error condition and triggers a QA error interrupt. The QA error interrupt causes the RSP to dump the QA diagnostics and perform a cbus complex during which all the line cards are reloaded. Although the duplicate buffer header condition does not always indicate a hardware failure, the downtime of up to 300 seconds creates a real problem in the network.
The hw-module main-cpu qa error-recovery command has been created to enable a recovery mechanism for a QA error by allowing the router to remove the duplicate buffer header from all the queues that contain the buffer header and the buffer header is then counted as lost. The show controller cbus command displays the number of lost buffer headers and the number of QA error recoveries. By using the QA error recovery, the downtime is reduced to less than one second under lab conditions. Three QA errors caused by buffer headers are permitted before the router performs a cbus complex and reloads all the line cards.
How to Configure QA Error Recovery for the Cisco 7500 Series
This section contains the following tasks:
•
Displaying QA Error Recovery Information (optional)
•
Disabling QA Error Recovery (optional)
Displaying QA Error Recovery Information
Perform this optional task to display details about the QA error recovery mechanism.
SUMMARY STEPS
1.
enable
2.
show controllers cbus
DETAILED STEPS
Step 1
enable
Enables privileged EXEC mode. Enter your password if prompted.
Router> enableStep 2
show controllers cbus
Use this command to view details about how many QA error recoveries occurred.
Router# show controllers cbusMEMD at E0000000, 2097152 bytes (unused 1728, recarves 1962, lost/qaerror recoveries 0/1)When a QA error occurs, the following extra logs are displayed as well as the normal logs. If the recovery is successful, a message is displayed reporting that the QA error recovery was successful..Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Trying to recover from QA ERROR..Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Removing buffer header 0xE360 from all queues.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 155 on queue 0x2E.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x2E (48000170) has 154 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 1 on queue 0x340.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x340 (48001A00) has 0 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: At least one QA queue is broken.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Recovered from QA ERROR.
Disabling QA Error Recovery
QA error recovery is enabled by default on all supported images for Cisco 7500 series routers except for Cisco IOS Release 12.0(24)S1 on which QA error recovery is disabled by default. To enable the router to recover from a QA error when using a Cisco IOS Release 12.0(24)S1 image on a Cisco 7500 series router, use the hw-module main-cpu qa error-recovery command. Perform this optional task if you need to disable the QA error recovery mechanism.
SUMMARY STEPS
1.
enable
2.
configure terminal
3.
no hw-module main-cpu qa error-recovery
4.
exit
DETAILED STEPS
Configuration Examples for QA Error Recovery for the Cisco 7500 Series
This section contains the following examples:
•
QA Error Recovery Messages: Example
•
Disabling QA Error Recovery: Example
QA Error Recovery Messages: Example
In the following example, the QA error recovery mechanism is enabled on a Cisco 7500 series router running a Cisco IOS Release 12.3(6) image. The partial output shows some of the messages logged to the display when a duplicate buffer header is detected and the QA error recovery mechanism is enabled. In this example, the error recovery is successful.
.Feb 3 22:17:35 GMT-4: %RSP-2-QAERROR: reused or zero link error, write at addr 1A00 (QA)log 221A0080, data E3600000 00000000.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Failed to enqueue buffer header 0xE360.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Approximate stack backtrace prior to interrupt:.Feb 3 22:17:35 GMT-4: %QA-3-DIAG:-Traceback= 6046CB8C 6046AEE4 6046AEB0 60625E78 602F887C 602F54D8 60010A24.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x28 (48000140) has 16 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x29 (48000148) has 8 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: No NULL terminator for queue 0x2A.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x2A (48000150) has 3080 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x2B (48000158) has 20 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x2C (48000160) has 16 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x2D (48000168) has 31 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 155 on queue 0x2E.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0x0000, element 156 on queue 0x2E is NULL!.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: No NULL terminator for queue 0x341.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x341 (48001A08) has 2 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x351 (48001A88) has 3 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: At least one QA queue is broken.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Buffer header at 0x4000E360: 1D8120 2360188 0 1D8100!.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Trying to recover from QA ERROR..Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Removing buffer header 0xE360 from all queues.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 155 on queue 0x2E.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x2E (48000170) has 154 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Buffer 0xE360 is element 1 on queue 0x340.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Queue 0x340 (48001A00) has 0 elements.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: At least one QA queue is broken.Feb 3 22:17:35 GMT-4: %QA-3-DIAG: Recovered from QA ERROR.Disabling QA Error Recovery: Example
In the following example, the QA error recovery mechanism is disabled on a Cisco 7500 series router running a Cisco IOS Release 12.3(6) image:
Router# configure terminalRouter(config)# no hw-module main-cpu qa error-recoveryRouter(config)# endAdditional References
The following sections provide references related to QA error recovery.
Related Documents
Standards
Standards TitleNo new or modified standards are supported by this feature, and support for existing standards has not been modified by this feature.
—
MIBs
RFCs
RFCs TitleNo new or modified RFCs are supported by this feature, and support for existing RFCs has not been modified by this feature.
—
Technical Assistance
Command Reference
This section documents the following modified command only.
•
hw-module main-cpu qa error-recovery
hw-module main-cpu qa error-recovery
To enable the recovery mechanism for a QA error condition on a Cisco 7500 series router, use the hw-module main-cpu qa error-recovery command in global configuration mode. To disable the recovery mechanism for a QA error condition, use the no form of this command.
hw-module main-cpu qa error-recovery
no hw-module main-cpu qa error-recovery
Syntax Description
This command has no arguments or keywords.
Defaults
In Cisco IOS Release 12.0(24)S1, the recovery mechanism for a QA error condition is disabled; in all other releases, it is enabled.
Command Modes
Global configuration
Command History
Usage Guidelines
QA errors are sometimes seen in heavy traffic situations and may indicate a hardware failure or a software bug. In the case of a hardware failure, a Versatile Interface Processor (VIP) or a Route Switch Processor (RSP) must be replaced. It is possible, however, to recover from a QA error and not see another error for months. When the same buffer header is present in two different queues, the QA ASIC goes into an error condition and triggers a QA error interrupt. The QA error interrupt causes the RSP to dump the QA diagnostics and perform a cbus complex during which all the line cards are reloaded. Although the duplicate buffer header condition does not always indicate a hardware failure, the downtime of up to 300 seconds creates a real problem in the network.
The hw-module main-cpu qa error-recovery command has been created to enable a recovery mechanism for a QA error by allowing the router to remove the duplicate buffer header from all the queues that contain the buffer header and the buffer header is then counted as lost. The show controller cbus command displays the number of lost buffer headers and the number of QA error recoveries. By using the QA error recovery, the downtime is reduced to less than one second under lab conditions. Three QA errors caused by buffer headers are permitted before the router performs a cbus complex and reloads all the line cards.
After three QA error conditions caused by duplicate queued buffer headers occur, the cbus complex is initiated and the line cards reload. After the line card reload, recovery is possible for another three QA errors. If an event occurs that triggers a memd recarve, such as a change in the maximum transmission unit (mtu), QA error recovery can handle a maximum of three QA errors after the memd recarve. Other QA error conditions, such as a null buffer header on any queue, can occur. Recovery is not possible in these cases, and the QA error triggers a cbus complex and subsequent line-card reloads. The QA error condition is specific to the Cisco 7500 series routers.
Examples
The following example shows how to enable the QA error recovery mechanism when a Cisco IOS Release 12.0(24)S1 image is used on a Cisco 7500 series router. In all other supported releases, the QA error recovery mechanism is enabled by default.
Router(config)# hw-module main-cpu qa error-recoveryRelated Commands
Glossary
RSP—Route Switch Processor. The Route Processor (RP) on Cisco 7500 series routers.
VIP—Versatile Interface Processor. Interface card used in Cisco 7500 series routers.
Note
Refer to Internetworking Terms and Acronyms for terms not included in this glossary.
Copyright © 2004 Cisco Systems, Inc. All rights reserved.

