Table Of Contents
Configuring GOLD
Understanding How Online Diagnostics Work
Configuring Online Diagnostics
Specifying the Bootup Online Diagnostic Level
Configuring On-Demand Online Diagnostics
Running On-Demand Online Diagnostic Tests
On-Demand Online Diagnostic Configuration Guidelines and Restrictions
On-Demand Online Diagnostic Configuration Procedure
Configuring Diagnostics Operations
Configuring Online Diagnostic Health-Monitoring Tests
Scheduling Online Diagnostics
Specifying the Online Diagnostic Failure Response
Specifying the Online Diagnostic Event Log Size
Displaying Online Diagnostic Tests and Test Results
Clearing the Online Diagnostic Configuration
Configuring GOLD
This chapter describes how to configure generic online diagnostics (GOLD) on the Catalyst 6500 series switches.
Note
For complete syntax and usage information for the commands that are used in this chapter, refer to the Catalyst 6500 Series Switch Command Reference publication.
This chapter consists of these sections:
•
Understanding How Online Diagnostics Work
•
Configuring Online Diagnostics
Understanding How Online Diagnostics Work
Note
GOLD is supported on the Supervisor Engine 720 and Supervisor Engine 32 only. However, earlier diagnostic commands are still supported on the Supervisor Engine 1 and Supervisor Engine 2.
Online diagnostics performs the following functions:
•
Test and verify the hardware functionality of the supervisor engine, modules, and switch while the switch is connected to a live network.
•
Perform packet switching tests that check different hardware components and verify the data path and control signals.
•
Detect problems in the following areas:
–
Hardware components
–
Interfaces (GBICs, Ethernet ports, and so forth)
–
Connectors (loose connectors, bent pins, and so forth)
–
Solder joints
–
Memory (failure over time)
Online diagnostics are categorized as follows:
•
Bootup—Bootup diagnostics run during bootup, module OIR, or switchover to a backup supervisor engine.
•
On-demand—On-demand diagnostics run from the CLI.
•
Schedule—Schedule diagnostics run at user-designated intervals or specified times when the switch is connected to a live network.
•
Health monitoring—Health-monitoring diagnostics run in the background.
There are two types of online diagnostic tests:
•
Disruptive online diagnostic tests—These tests include the built-in self-test (BIST) and the disruptive loopback test.
•
Nondisruptive online diagnostic tests—These tests include packet switching, and run during bootup, line card online insertion and removal (OIR), and system reset diagnostic tests. The nondisruptive online diagnostic tests run as part of the background health monitoring or at your request (on-demand).
Online diagnostics is one of the requirements for the high-availability feature. High availability is a set of quality standards that seek to limit the impact of equipment failures on the network. A key part of high availability is detecting hardware failures and taking corrective action while the switch runs in a live network. Online diagnostics in high availability detect hardware failures and provide feedback to high-availability software components to make switchover decisions.
Configuring Online Diagnostics
These sections describe how to configure online diagnostics:
•
Specifying the Bootup Online Diagnostic Level
•
Configuring On-Demand Online Diagnostics
•
Configuring Online Diagnostic Health-Monitoring Tests
•
Scheduling Online Diagnostics
•
Specifying the Online Diagnostic Failure Response
•
Specifying the Online Diagnostic Event Log Size
•
Displaying Online Diagnostic Tests and Test Results
•
Clearing the Online Diagnostic Configuration
Specifying the Bootup Online Diagnostic Level
You can specify the bootup online diagnostics level as minimal or complete, or you can bypass the bootup diagnostics entirely. Enter the complete keyword to run all diagnostic tests; enter the minimal keyword to run only PFC tests for the supervisor engine and loopback tests for all ports in the switch; enter the bypass keyword to bypass all diagnostic tests. The default bootup diagnostics level is minimal.
Note
Although the default is minimal, we recommend that you set the bootup online diagnostics level to complete. We do not recommend bypassing the bootup online diagnostics.
Note
The bootup diagnostic level applies to the entire switch and cannot be configured on a per-module basis.
To specify the bootup diagnostic level, perform this task in privileged mode:
| |
Task
|
Command
|
Step 1
|
Specify the bootup diagnostic level.
|
set diagnostic bootup level [bypass | minimal | complete]
|
Step 2
|
Display the bootup diagnostic level.
|
show diagnostic bootup level
|
This example shows how to specify bypass as the bootup diagnostic level:
Console> (enable) set diagnostic bootup level complete
Diagnostic level set to complete
Console> (enable) show diagnostic bootup level
Current bootup diagnostic level: complete
Configuring On-Demand Online Diagnostics
Caution 
Most of the online diagnostic memory tests are on-demand tests because of their disruptive nature and time duration. You should use the memory tests only when you suspect a problem in the hardware and only after you have isolated the system from the live production network environment.
Note
Online diagnostic tests use the EOBC channel to communicate with the rest of the system. Proper working of the EOBC channel between the supervisor engine and the SLCP, LCP, and the module processors is required for performing the online diagnostic tests.
Use the information in these sections for configuring on-demand online diagnostics:
•
Running On-Demand Online Diagnostic Tests
•
On-Demand Online Diagnostic Configuration Guidelines and Restrictions
•
On-Demand Online Diagnostic Configuration Procedure
•
Configuring Diagnostics Operations
Running On-Demand Online Diagnostic Tests
Caution 
Use this section to familiarize yourself with the on-demand
diagnostic start and
diagnostic stop commands. To run any of the on-demand online diagnostic tests, use the procedure in the
"On-Demand Online Diagnostic Configuration Procedure" section. Do not attempt to run these tests without following the on-demand online diagnostics configuration procedure.
Use the diagnostic start command to start running specific test(s) based on the test IDs. The command accepts one test ID, a range of test IDs, a subgroup of tests, or all for all tests. The test ID for a particular test can be different from one module type to another module type or even from one software release to another software release. It is important that you obtain the correct test ID and relevant test name using the show diagnostic content command. Use the diagnostic stop module mod command to stop running tests on the specified module. The complete syntax for the diagnostic start and diagnostic stop commands is as follows:
diagnostic start module mod_num test {all | test_ID_num | test_list | complete | minimal | non-disruptive | per-port} [port {all | port_num | port_list}]
diagnostic stop module mod
On-Demand Online Diagnostic Configuration Guidelines and Restrictions
This section describes the configuration guidelines and restrictions for performing the on-demand test configuration steps described in the "On-Demand Online Diagnostic Configuration Procedure" section:
•
After running tests in a particular step, the tests in earlier steps may not work.
•
You may need to perform certain actions before and after running a test. These actions are described in the configuration procedure.
•
Some of the tests are disruptive. The configuration procedure provides guidance for running any disruptive tests.
•
You should run packet-switching tests before you run memory tests.
•
Memory tests should always be run on modules first and then on the supervisor engine because after running the memory tests on the supervisor engine, the system is in an unusable state and needs to be rebooted immediately for normal operation.
Note
With software release 8.5(1), memory tests are available only for the supervisor engine. Memory tests for other modules are planned for subsequent releases.
On-Demand Online Diagnostic Configuration Procedure
To run on-demand online diagnostic tests, perform these steps:
Step 1
Run the nondisruptive tests. Nondisruptive tests are packet-switching tests and do not disrupt the system operation in any way. These tests take only a few seconds to finish.
Additional test requirements are as follows:
•
User actions before running the test—None
•
User actions after running the test—None
Step 2
The packet-switching tests fall into various functional test groups. Use the following tables to determine which functional test group you want to test and then run tests in that functional test group:
•
Table 21-1, On-Demand Tests: Supervisor Engine
•
Table 21-2, On-Demand Tests: Fabric-Enabled Modules
•
Table 21-3, On-Demand Tests: Non-Fabric-Enabled Modules
Note
Not all functional test groups are present for every module because the supported functional test groups vary depending on the module type. If you are not sure which functional test group to select, run all the packet switching tests that are run during bootup when the diagnostic level is set to "complete" by entering the diagnostic start module mod/num test complete command.
Note
If you run the loopback test and it fails on one or several ports of a module, disconnect any cables that are connected to the ports on that module, shut down all the ports on that module, and then rerun the loopback test. It is possible that some spurious packets are interfering with the loopback test and causing it to fail. Also, if the module has an inline-power daughter card, disable power to the inline-power daughter card before running the test.
Additional test requirements are as follows:
•
User actions before running the test—None
•
User actions after running the test—None
Table 21-1 On-Demand Tests: Supervisor Engine
Functional Test Group
|
Individual Tests
|
Per-port tests
|
TestLoopback
|
Layer-2 forwarding tests
|
TestNewIndexLearn TestMatchCapture TestDontConditionalLearn TestProtocolMatchChannel TestBadBpduTrap
|
NetFlow function
|
TestNetflowInlineRewrite
|
ACL/QOS function
|
TestAclPermit TestQosTcam TestAclDeny
|
IP version 4 function
|
TestIPv4FibShortcut TestFibDevices TestL3Capture2 TestNATFibShortcut
|
Multicast function
|
TestL3VlanMet
|
SPAN function
|
TestIngressSpan TestEgressSpan
|
Fabric connection
|
TestFabricSnakeForward TestFabricSnakeBackward
|
EOBC connection
|
Proceed to Step 3
|
Packet Buffer issues
|
Proceed to Step 4
|
Table 21-2 On-Demand Tests: Fabric-Enabled Modules
Functional Test Group
|
Individual Tests
|
Per-port tests
|
TestLoopback
|
Multicast function
|
TestL3VlanMet
|
SPAN function
|
TestIngressSpan TestEgressSpan
|
Fabric Tests
|
TestSynchedFabChannel
|
Table 21-3 On-Demand Tests: Non-Fabric-Enabled Modules
Functional Test Group
|
Individual Tests
|
Per-port tests
|
TestLoopback TestNetflowInlineRewrite
|
Step 3
Run the TestTrafficStress test.
Note
With software release 8.5(1), the TestTrafficStress test is not available. This test might be available in subsequent releases. If the test is not available, proceed to the next step.
This disruptive packet-switching test is available only on the supervisor engine. The test pairs ports across the system so that packets are switched between those ports at line-rate for stress-testing the system. The test takes a few minutes to finish. During the test, all the ports are shut down and some ports might go up and down (flap). Note that any ports that are down will not come back up after the test is finished. Additional test requirements are as follows:
•
User actions before running the test—All health-monitoring tests for the module should be disabled before running this test.
•
User actions after running the test—None
Step 4
Run the TestEobcStressPing test.
Note
With software release 8.5(1), the TestEobcStressPing test is not available. This test might be available in subsequent releases. If the test is not available, proceed to the next step.
This disruptive test checks the EOBC connection for the specified module. The test takes a couple of minutes to finish. You cannot run the packet-switching tests described in previous steps after running this test. However, you can run the tests described in Step 5. Additional requirements are as follows:
•
User actions before running the test—You should disable all health-monitoring tests for the module before running this test because the EOBC connection is disrupted and will cause the health-monitoring tests to fail.
•
User actions after running the test—Either run the tests mentioned in Step 5 or power cycle the module to return to normal operation. After the module comes online, reenable the health-monitoring tests that were disabled.
Step 5
Run the exhaustive memory tests.
Exhaustive memory tests exist for the supervisor engine and other modules. You should execute the memory tests on the supervisor engine only after the memory tests have been run on the other modules. This order is required because after running the supervisor engine memory tests, the system is in an unusable state and needs to be rebooted to return to a normal operating state.
Note
With software release 8.5(1), memory tests are available only for the supervisor engine. Memory tests for other modules are planned for subsequent releases.
Note
No other tests can be run on the supervisor engine or other modules after running the exhaustive memory tests.
Caution 
Before running any of the memory tests, you must follow all of the requirements listed in the "User actions before running the test" bullet.
You can run the exhaustive memory tests on an individual basis. Some of the tests can take several hours to finish due to the size of the memory. Since each module has several memory tests and they are interdependent, the order of running these tests on each module is critical.
Note
With software release 8.5(1), the TestFibTcamSSRAM test is the only available exhaustive memory test. The other memory tests (items 2 through 5 below), are planned for subsequent releases.
The order for running these tests is as follows:
1.
TestFibTcamSSRAM
2.
TestAclQosTcam
3.
TestNetflowTcam
4.
TestAsicMemory
5.
TestLinecardMemory
If a particular test does not exist for a module, it can be skipped.
Additional requirements are as follows:
•
Before running the test:
–
Turn off all background health-monitoring tests on the supervisor engine and switching modules using the clear diagnostic monitor module num test all command.
–
Isolate network traffic by disabling all connected ports.
–
Before the test, make sure you do not send test packets during a memory test.
–
Remove all switching modules for testing FIB TCAM and SSRAM on the policy feature card (PFC) of the supervisor engine.
Reset the system or the module that you are testing before returning the system to normal operating mode.
•
After running the test:
–
For supervisor engines—Reboot the switch but do not save the configuration while rebooting because the configuration was changed during the test.
–
For other modules—Power cycle the modules. After the modules come online, reenable the health-monitor tests that were disabled.
Configuring Diagnostics Operations
You can specify that the on-demand online diagnostics continue to run until a configurable number of failures occur by entering the continue failure_limit keyword. The failure_limit range is 0 to 65534 failures. You can specify that the on-demand online diagnostics stop running when a single failure occurs by entering the stop keyword. You can specify that an on-demand test be run multiple times by entering the iterations number_of_iterations keyword. The number_of_iterations range is 1 to 999.
The complete syntax for these commands is as follows:
set diagnostic ondemand action-on-failure [continue failure_limit | stop]
set diagnostic ondemand iterations number_of_iterations
Configuring Online Diagnostic Health-Monitoring Tests
You can configure health-monitoring diagnostic testing on specified modules while the switch is connected to a live network. You can specify the execution interval for each health-monitoring test, whether or not to generate a system message upon test failure, or whether an individual test should be enabled or disabled.
The disruptive tests are disabled by default. A set number of nondisruptive tests (not all) are enabled by default. Use the show diagnostic content module mod_list command to determine which tests are disruptive (D) and nondisruptive (N) by checking the "Attributes" column. Use this information for configuring additional health-monitoring tests. We recommend that you use only nondisruptive tests for health monitoring.
To configure online diagnostic health-monitoring tests, perform this task in privileged mode:
| |
Task
|
Command
|
Step 1
|
Specify the online diagnostic monitoring interval.
|
set diagnostic monitor interval module mod_num test {all | test_ID_num | test_list} hh:mm:ss1
|
Step 2
|
(Optional) Enable health-monitoring diagnostic tests.
|
set diagnostic monitor module mod_num test {test-id | test-id-range | all}
|
Step 3
|
Enable syslog generation when a test fails.
|
set diagnostic monitor syslog
|
Step 4
|
Display the online diagnostic monitoring configuration.
|
show diagnostic content module {mod_list | all}
|
This example shows how to specify that the online diagnostic health-monitoring tests (test 18) be run on module 7 at 12:12:12 and 100 milliseconds every 10 days:
Console> (enable) set diagnostic monitor interval module 7 test 18 12:12:12 100 10
Diagnostic monitor interval set at 12:12:12 100 10 for module 7 test 18
This example shows how to enable test 18 on module 7:
Console> (enable) set diagnostic monitor module 7 test 18
Module 7 test 18 diagnostic monitor enable.
This example shows how to enable syslog generation when a test fails:
Console> (enable) set diagnostic monitor syslog
Diagnostic monitor syslog enable.
Scheduling Online Diagnostics
You can schedule online diagnostics to run at a designated time of day or on a daily, weekly, or monthly basis for a specific module. You can specify that all tests be run or that individual tests be run. The tests can be scheduled to run only once or be repeated at specified intervals.
Note
After you schedule the online diagnostics to run at a designated time, the online diagnostics will not run at the designated time if you change the system time using the set time command. For example, if you schedule the online diagnostics to run at 3:00 pm, then change the system time to 2:59 pm, the online diagnostics will not run at 3:00 pm.
To schedule online diagnostics, perform this task in privileged mode:
| |
Task
|
Command
|
Step 1
|
Schedule online diagnostics.
|
set diagnostic schedule module slot_num test {test-id | test-id-range | all} {[port {port_num | port_num_range | all}] | [daily hh:mm] [on month day_of_month year hh:mm] [weekly day hh:mm]}
|
Step 2
|
Display the online diagnostic scheduling.
|
show diagnostic schedule module mod_list
|
This example shows how to schedule diagnostic testing (tests 1 and 2 specified) to occur on a specific date and time for a specific module:
Console> (enable) set diagnostic schedule module 7 test 1 daily 12:12
Diagnostic schedule set at daily 12:12 for module 7 test 1
This example shows how to schedule diagnostic testing (test 1 specified) to occur daily at a certain time for a specific port and module:
Console> (enable) set diagnostic schedule module 7 test 3 port 1 daily 16:16
Diagnostic schedule set at daily 16:16 for module 7 test 3
Console> (enable) show diagnostic schedule module 7
Current Time = Fri Apr 15 2005, 16:56:06
Test ID(s) to be executed: 1-2.
Test ID(s) to be executed: 3.
Specifying the Online Diagnostic Failure Response
You can specify the online diagnostic failure response for the supervisor engine. If you specify the ignore keyword, the supervisor engine boots up after failing the online diagnostics. If you specify the system keyword (the default), the supervisor engine is kept offline and module-specific corrective action is taken.
To specify the online diagnostic failure response for the supervisor engine, perform this task in privileged mode:
| |
Task
|
Command
|
Step 1
|
Specify the online diagnostic failure response for the supervisor engine.
|
set diagnostic diagfail-action {ignore | system}
|
Step 2
|
Display the configuration settings for the online diagnostic failure response for the supervisor engine.
|
show diagnostic diagfail-action
|
This example shows how to specify that the supervisor engine goes offline after failing the online diagnostics:
Console> (enable) set diagnostic diagfail-action system
Diagnostic failure action set to system.
Console> (enable) show diagnostic diagfail-action
Diagnostic failure action at last bootup : system
Diagnostic failure action at next reset : system
Specifying the Online Diagnostic Event Log Size
The default setting is 500 entries and the range is 1 to 10000 entries.
To specify the online diagnostic event-log size, perform this task in privileged mode:
Task
|
Command
|
Specify the online diagnostic event-log size.
|
set diagnostic event-log size [size]
|
This example shows how to specify 1000 entries for the online diagnostic event-log size:
Console> (enable) set diagnostic event-log size 1000
Diagnostic event-log size set to 1000
Displaying Online Diagnostic Tests and Test Results
You can display the online diagnostic tests that are configured for specific modules and check the results of the tests using the show commands.
To display online diagnostic test information, perform these tasks in normal mode:
Task
|
Command
|
Display the bootup diagnostic level.
|
show diagnostic bootup level
|
Display the test content for the specified module(s) or all modules.
|
show diagnostic content module [mod_num | all]
|
Display the configuration settings for the online diagnostic failure response for the supervisor engine.
|
show diagnostic diagfail-action
|
Display the diagnostic event log.
|
show diagnostic events [event-type {error | info | warning}]
show diagnostic events [module {mod_list | all}]
show diagnostic events
|
Display the online diagnostic on-demand configuration settings.
|
show diagnostic ondemand settings
|
Display the diagnostic test results for the specified module(s) or all modules.
|
show diagnostic result module mod_list | all [detail | test] [test_list] [detail]
|
Display the online diagnostic scheduling.
|
show diagnostic schedule module mod_list
|
Display the current online diagnostic status for all modules.
|
show diagnostic status
|
Clearing the Online Diagnostic Configuration
To clear online diagnostic configuration parameters, perform these tasks in normal mode:
Task
|
Command
|
Clear the bootup online diagnostic level.
|
clear diagnostic bootup level
|
Clear the online diagnostic event-log size.
|
clear diagnostic event-log size
|
Clear the online diagnostic health-monitoring configuration.
|
clear diagnostic monitor interval module mod_list test [test_list | all]
clear diagnostic monitor module mod test test_list
clear diagnostic monitor syslog
|
Disable syslog generation that occurs when a test fails.
|
clear diagnostic monitor syslog
|
Clear online diagnostic scheduling information.
|
clear diagnostic schedule module mod_num test {test-id | test-id-range | all} {[port {port_num | port_range | all}] | [device {device_num | device_range | all}]}
|
This example shows how to clear the bootup online diagnostic level:
Console> (enable) clear diagnostic bootup level
Diagnostic level set to bypass
This example shows how to clear the online diagnostic event-log size:
Console> (enable) clear diagnostic event-log size
Diagnostic event-log size set to default(500)
These examples show how to clear the online diagnostic monitoring configuration:
Console> (enable) clear diagnostic monitor interval module 7 test 3
Clear diagnostic monitor interval for module 7 test 3
Console> (enable) clear diagnostic monitor module 7 test 1
Module 7 test 1 diagnostic monitor disable.
Console> (enable) clear diagnostic monitor syslog
Diagnostic monitor syslog disable.
Clear the online diagnostic scheduling configuration for tests 1 and 2 on module 7:
Console> (enable) clear diagnostic schedule module 7 test 1-2 daily 12:12
Clear diagnostic schedule at daily 12:12 for module 7 test 1-2