Configuring Online Diagnostics
This chapter describes how to configure the online diagnostics in Cisco IOS Release 12.2SX.
Note For complete syntax and usage information for the commands used in this chapter, see the Cisco IOS Master Command List, at this URL:
http://www.cisco.com/en/US/docs/ios/mcl/allreleasemcl/all_book.html
Tip For additional information about Cisco Catalyst 6500 Series Switches (including configuration examples and troubleshooting information), see the documents listed on this page:
http://www.cisco.com/en/US/products/hw/switches/ps708/tsd_products_support_series_home.html
Participate in the Technical Documentation Ideas forum
This chapter consists of these sections:
•Understanding Online Diagnostics
•Configuring Online Diagnostics
•Running Online Diagnostic Tests
•Performing Memory Tests
Understanding Online Diagnostics
With online diagnostics, you can test and verify the hardware functionality of the switch while the switch is connected to a live network.
The online diagnostics contain packet switching tests that check different hardware components and verify the data path and control signals. Disruptive online diagnostic tests, such as the built-in self-test (BIST) and the disruptive loopback test, and nondisruptive online diagnostic tests, such as packet switching, run during bootup, module online insertion and removal (OIR), and system reset. The nondisruptive online diagnostic tests run as part of background health monitoring. Either disruptive or nondisruptive tests can be run at the user's request (on-demand).
The online diagnostics detect problems in the following areas:
•Hardware components
•Interfaces (GBICs, Ethernet ports, and so forth)
•Connectors (loose connectors, bent pins, and so forth)
•Solder joints
•Memory (failure over time)
Online diagnostics is one of the requirements for the high availability feature. High availability is a set of quality standards that seek to limit the impact of equipment failures on the network. A key part of high availability is detecting hardware failures and taking corrective action while the switch runs in a live network. Online diagnostics in high availability detect hardware failures and provide feedback to high availability software components to make switchover decisions.
Online diagnostics are categorized as bootup, on-demand, schedule, or health-monitoring diagnostics. Bootup diagnostics run during bootup; on-demand diagnostics run from the CLI; schedule diagnostics run at user-designated intervals or specified times when the switch is connected to a live network; and health-monitoring runs in the background.
Configuring Online Diagnostics
These sections describe how to configure online diagnostics:
•Setting Bootup Online Diagnostics Level
•Configuring On-Demand Online Diagnostics
•Scheduling Online Diagnostics
Setting Bootup Online Diagnostics Level
You can set the bootup diagnostics level as minimal or complete or you can bypass the bootup diagnostics entirely. Enter the complete keyword to run all diagnostic tests; enter the minimal keyword to run only EARL tests and loopback tests for all ports in the switch. Enter the no form of the command to bypass all diagnostic tests. The default bootup diagnositcs level is minimal.
To set the bootup diagnostic level, perform this task:
|
|
Router(config)# diagnostic bootup level {minimal | complete} |
Sets the bootup diagnostic level. |
This example shows how to set the bootup online diagnostic level:
Router(config)# diagnostic bootup level complete
This example shows how to display the bootup online diagnostic level:
Router(config)# do show diagnostic bootup level
Configuring On-Demand Online Diagnostics
You can run the on-demand online diagnostic tests from the CLI. You can set the execution action to either stop or continue the test when a failure is detected or to stop the test after a specific number of failures occur by using the failure count setting. You can configure a test to run multiple times using the iteration setting.
You should run packet-switching tests before memory tests.
Note Do not use the diagnostic start all command until all of the following steps are completed.
Because some on-demand online diagnostic tests can affect the outcome of other tests, you should perform the tests in the following order:
1. Run the nondisruptive tests.
2. Run all tests in the relevant functional area.
3. Run the TestTrafficStress test.
4. Run the TestEobcStressPing test.
5. Run the exhaustive-memory tests.
To run on-demand online diagnostic tests, perform this task:
Step 1 Run the nondisruptive tests.
To display the available tests and their attributes, and determine which commands are in the nondisruptive category, enter the show diagnostic content command.
Step 2 Run all tests in the relevant functional area.
Packet-switching tests fall into specific functional areas. When a problem is suspected in a particular functional area, run all tests in that functional area. If you are unsure about which functional area you need to test, or if you want to run all available tests, enter the complete keyword.
Step 3 Run the TestTrafficStress test.
This is a disruptive packet-switching test. This test switches packets between pairs of ports at line rate for the purpose of stress testing. During this test all of the ports are shut down, and you may see link flaps. The link flaps will recover after the test is complete. The test takes several minutes to complete.
Disable all health-monitoring tests f before running this test by using the no diagnostic monitor module number test all command.
Step 4 Run the TestEobcStressPing test.
This is a disruptive test and tests the Ethernet over backplane channel (EOBC) connection for the module. The test takes several minutes to complete. You cannot run any of the packet-switching tests described in previous steps after running this test. However, you can run tests described in subsequent steps after running this test.
Disable all health-monitoring tests before running this test by using the no diagnostic monitor module number test all command. The EOBC connection is disrupted during this test and will cause the health-monitoring tests to fail and take recovery action.
Step 5 Run the exhaustive-memory tests.
Before running the exhaustive-memory tests, all health-monitoring tests should be disabled because the tests will fail with health monitoring enabled and the switch will take recovery action. Disable the health-monitoring diagnostic tests by using the no diagnostic monitor module number test all command.
Perform the exhaustive-memory tests in the following order:
1. TestFibTcamSSRAM
2. TestAclQosTcam
3. TestNetFlowTcam
4. TestAsicMemory
5. TestAsicMemory
You must reboot the after running the exhaustive-memory tests before it is operational again. You cannot run any other tests on the switch after running the exhaustive-memory tests. Do not save the configuration when rebooting as it will have changed during the tests. After the reboot, reenable the health-monitoring tests using the diagnostic monitor module number test all command.
To set the bootup diagnostic level, perform this task:
|
|
Router# diagnostic ondemand {iteration iteration_count} | {action-on-error {continue | stop}[error_count]} |
Configures on-demand diagnostic tests to run, how many times to run (iterations), and what action to take when errors are found. |
This example shows how to set the on-demand testing iteration count:
Router# diagnostic ondemand iteration 3
This example shows how to set the execution action when an error is detected:
Router# diagnostic ondemand action-on-error continue 2
Scheduling Online Diagnostics
You can schedule online diagnostics to run at a designated time of day or on a daily, weekly, or monthly basis. You can schedule tests to run only once or to repeat at an interval. Use the no form of this command to remove the scheduling.
To schedule online diagnostics, perform this task:
|
|
Router(config)# diagnostic schedule module number test {test_id | test_id_range | all} [port {num | num_range | all}] {on mm dd yyyy hh:mm} | {daily hh:mm} | {weekly day_of_week hh:mm} |
Schedules on-demand diagnostic tests on the specified module for a specific date and time, how many times to run (iterations), and what action to take when errors are found. |
This example shows how to schedule diagnostic testing on a specific date and time for a specific port on module 1:
Router(config)# diagnostic schedule module 1 test 1,2,5-9 port 3 on january 3 2003 23:32
This example shows how to schedule diagnostic testing to occur daily at a certain time for a specific port:
Router(config)# diagnostic schedule module 1 test 1,2,5-9 port 3 daily 12:34
This example shows how to schedule diagnostic testing to occur weekly on a certain day for a specific port:
Router(config)# diagnostic schedule module 1 test 1,2,5-9 port 3 weekly friday 09:23
Configuring Health-Monitoring Diagnostics
You can configure health-monitoring diagnostic testing while the switch is connected to a live network. You can configure the execution interval for each health-monitoring test, the generation of a system message upon test failure, or the enabling or disabling an individual test. Use the no form of this command to disable testing.
To configure health-monitoring diagnostic testing, perform this task:
|
|
|
Step 1 |
Router(config)# diagnostic monitor interval module number test {test_id | test_id_range | all} [hour hh] [min mm] [second ss] [millisec ms] [day day] |
Configures the health-monitoring interval of the specified tests. The no form of this command will change the interval to the default interval, or zero. |
Step 2 |
Router(config)#[no] diagnostic monitor module number test {test_id | test_id_range | all} |
Enables or disables health-monitoring diagnostic tests. |
Step 3 |
Router# show diagnostic health |
Displays the output for the health checks performed. |
This example shows how to configure the specified test to run every two minutes on module 1:
Router(config)#
diagnostic monitor interval module 1 test 1 min 2
This example shows how to run the test if health monitoring has not previously been enabled:
Router(config)#
diagnostic monitor module 1 test 1
This example shows how to enable the generation of a syslog message when any health-monitoring test fails:
Router(config)#
diagnostic monitor syslog
Running Online Diagnostic Tests
After you configure online diagnostics, you can start or stop diagnostic tests or display the test results. You can also see which tests are configured and what diagnostic tests have already run.
These sections describe how to run online diagnostic tests after they have been configured:
•Starting and Stopping Online Diagnostic Tests
•Running All Online Diagnostic Tests
•Displaying Online Diagnostic Tests and Test Results
Note•We recommend that before you enable any online diagnostics tests that you enable the logging console/monitor to see all warning messages.
•We recommend that when you are running disruptive tests that you only run the tests when connected through the console. When disruptive tests are complete, a warning message on the console recommends that you reload the system to return to normal operation. Strictly follow this warning.
•While tests are running, all ports are shut down because a stress test is being performed with ports configured to loop internally; external traffic might alter the test results. The switch must be rebooted to bring the switch to normal operation. When you issue the command to reload the switch, the system will ask you if the configuration should be saved. Do not save the configuration.
•If you are running the tests on a supervisor engine, after the test is initiated and complete, you must reload or power down and then power up the entire system.
•If you are running the tests on a switching module, rather than the supervisor engine, after the test is initiated and complete, you must reset the switching module.
Starting and Stopping Online Diagnostic Tests
After you configure diagnostic tests to run, you can use the start and stop to begin or end a diagnostic test.
To start or stop an online diagnostic command, perform one of these tasks:
|
|
Router# diagnostic start module number test {test_id | test_id_range | minimal | complete | basic | per-port | non-disruptive | all} [port {num | port#_range | all}] |
Starts a diagnostic test on a port or range of ports on the specified module. |
Router# diagnostic stop module number |
Stops a diagnostic test on the specified module. |
This example shows how to start a diagnostic test on module 1:
Router# diagnostic start module 1 test 5
Module 1:Running test(s) 5 may disrupt normal system operation
Do you want to run disruptive tests? [no]yes
00:48:14:Running OnDemand Diagnostics [Iteration #1] ...
00:48:14:%DIAG-SP-6-TEST_RUNNING:Module 1:Running TestNewLearn{ID=5} ...
00:48:14:%DIAG-SP-6-TEST_OK:Module 1:TestNewLearn{ID=5} has completed successfully
00:48:14:Running OnDemand Diagnostics [Iteration #2] ...
00:48:14:%DIAG-SP-6-TEST_RUNNING:Module 1:Running TestNewLearn{ID=5} ...
00:48:14:%DIAG-SP-6-TEST_OK:Module 1:TestNewLearn{ID=5} has completed successfully
This example shows how to stop a diagnostic test:
Router# diagnostic stop module 1
Running All Online Diagnostic Tests
You can run all diagnostic tests, disruptive and nondisruptive, at once with a single command. In this case, all test dependencies will be handled automatically.
Note•Running all online diagnostic tests will disrupt normal system operation. Reset the system after the diagnostic start system test all command has completed.
•Do not insert, remove, or power down modules or the supervisor while the system test is running.
•Do not issue any diagnostic command other than the diagnostic stop system test all command while the system test is running.
•Make sure no traffic is running in background.
To start or stop all online diagnostic tests, perform one of these tasks:
|
|
Router# diagnostic start system test all |
Executes all online diagnostic tests. |
Router# diagnostic stop system test all |
Stops the execution of all online diagnostic tests. |
This example shows how to start all online diagnostic tests:
Router# diagnostic start system test all
*************************************************************************
* 'diagnostic start system test all' will disrupt normal system *
* operation. The system requires RESET after the command *
* 'diagnostic start system test all' has completed prior to *
* 1. DO NOT INSERT, OIR, or POWER DOWN Linecards or *
* Supervisor while system test is running. *
* 2. DO NOT ISSUE ANY DIAGNOSTIC COMMAND except *
* "diagnostic stop system test all" while system test *
* 3. PLEASE MAKE SURE no traffic is running in background. *
*************************************************************************
Do you want to continue? [no]:
Displaying Online Diagnostic Tests and Test Results
You can display the online diagnostic tests that are configured and check the results of the tests using the following show commands:
•show diagnostic content
•show diagnostic health
To display the diagnostic tests that are configured, perform this task:
|
|
show diagnostic {bootup level | content [module num] | events [module num] [event-type event-type] | health | ondemand settings | result [module num] [detail] | schedule [module num]} |
Displays the test results of online diagnostics and lists supported test suites. |
This example shows how to display the online diagnostics that are configured on module 1:
Router# show diagnostic content module 1
Module 1: Supervisor Engine 32 8GE (Active)
Diagnostics test suite attributes:
M/C/* - Minimal bootup level test / Complete bootup level test / NA
B/* - Basic ondemand test / NA
P/V/* - Per port test / Per device test / NA
D/N/* - Disruptive test / Non-disruptive test / NA
S/* - Only applicable to standby unit / NA
X/* - Not a health monitoring test / NA
F/* - Fixed monitoring interval test / NA
E/* - Always enabled monitoring test / NA
A/I - Monitoring is active / Monitoring is inactive
R/* - Power-down line cards and need reload supervisor / NA
K/* - Require resetting the line card after the test has completed / NA
T/* - Shut down all ports and need reload supervisor / NA
ID Test Name Attributes day hh:mm:ss.ms shold
==== ================================== ============ =============== =====
1) TestScratchRegister -------------> ***N****A*** 000 00:00:30.00 5
2) TestSPRPInbandPing --------------> ***N****A*** 000 00:00:15.00 10
3) TestTransceiverIntegrity --------> **PD****I*** not configured n/a
4) TestActiveToStandbyLoopback -----> M*PDSX**I*** not configured n/a
5) TestLoopback --------------------> M*PD*X**I*** not configured n/a
6) TestTxPathMonitoring ------------> M**N****A*** 000 00:00:02.00 10
7) TestNewIndexLearn ---------------> M**N****I*** 000 00:00:15.00 10
8) TestDontConditionalLearn --------> M**N****I*** 000 00:00:15.00 10
9) TestBadBpduTrap -----------------> M**D*X**I*** not configured n/a
10) TestMatchCapture ----------------> M**D*X**I*** not configured n/a
11) TestProtocolMatchChannel --------> M**D*X**I*** not configured n/a
12) TestFibDevices ------------------> M**N****I*** 000 00:00:15.00 10
13) TestIPv4FibShortcut -------------> M**N****I*** 000 00:00:15.00 10
14) TestL3Capture2 ------------------> M**N****I*** 000 00:00:15.00 10
15) TestIPv6FibShortcut -------------> M**N****I*** 000 00:00:15.00 10
16) TestMPLSFibShortcut -------------> M**N****I*** 000 00:00:15.00 10
17) TestNATFibShortcut --------------> M**N****I*** 000 00:00:15.00 10
18) TestAclPermit -------------------> M**N****I*** 000 00:00:15.00 10
19) TestAclDeny ---------------------> M**D*X**I*** not configured n/a
20) TestQoSTcam ---------------------> M**D*X**I*** not configured n/a
21) TestL3VlanMet -------------------> M**N****I*** 000 00:00:15.00 10
22) TestIngressSpan -----------------> M**N****I*** 000 00:00:15.00 10
23) TestEgressSpan ------------------> M**D*X**I*** not configured n/a
24) TestNetflowInlineRewrite --------> C*PD*X**I*** not configured n/a
25) TestTrafficStress ---------------> ***D*X**I**T not configured n/a
26) TestFibTcamSSRAM ----------------> ***D*X**IR** not configured n/a
27) TestAsicMemory ------------------> ***D*X**IR** not configured n/a
28) TestAclQosTcam ------------------> ***D*X**IR** not configured n/a
29) TestNetflowTcam -----------------> ***D*X**IR** not configured n/a
30) ScheduleSwitchover --------------> ***D*X**I*** not configured n/a
31) TestFirmwareDiagStatus ----------> M**N****I*** 000 00:00:15.00 10
32) TestAsicSync --------------------> ***N****A*** 000 00:00:15.00 10
33) TestUnusedPortLoopback ----------> **PN****A*** 000 00:01:00.00 10
34) TestErrorCounterMonitor ---------> ***N****A*** 000 00:00:30.00 10
35) TestPortTxMonitoring ------------> **PN****A*** 000 00:01:15.00 5
36) TestL3HealthMonitoring ----------> ***N**FEA*** 000 00:00:05.00 10
37) TestCFRW ------------------------> M*VN*X**I*** not configured n/a
This example shows how to display the online diagnostic results for module 1:
Router# show diagnostic result module 1
Current bootup diagnostic level: bypass
Module 1: Cisco ME 6524 Ethernet Switch SerialNo : CAT103956WS
Overall Diagnostic Result for Module 1 : MINOR ERROR
Diagnostic level at card bootup: bypass
Test results: (. = Pass, F = Fail, U = Untested)
1) TestSPRPInbandPing --------------> .
2) TestTransceiverIntegrity:
Port 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
----------------------------------------------------------------------------
U U U U U U U U U U U U U U U U U U U U U U U U
Port 25 26 27 28 29 30 31 32
----------------------------
3) TestScratchRegister -------------> .
4) TestNonDisruptiveLoopback:
Port 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
----------------------------------------------------------------------------
. . . . . . . . . . . . . . . . . . . . . . . .
Port 25 26 27 28 29 30 31 32
----------------------------
Port 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
----------------------------------------------------------------------------
. F . . . . . . . . . . . . . . . . . . . . . .
Port 25 26 27 28 29 30 31 32
----------------------------
6) TestNewIndexLearn ---------------> .
7) TestDontConditionalLearn --------> .
8) TestBadBpduTrap -----------------> .
9) TestMatchCapture ----------------> .
10) TestProtocolMatchChannel --------> .
11) TestFibDevices ------------------> .
12) TestIPv4FibShortcut -------------> .
13) TestL3Capture2 ------------------> .
14) TestIPv6FibShortcut -------------> .
15) TestMPLSFibShortcut -------------> .
16) TestNATFibShortcut --------------> .
17) TestAclPermit -------------------> .
18) TestAclDeny ---------------------> .
19) TestQoSTcam ---------------------> .
20) TestL3VlanMet -------------------> .
21) TestIngressSpan -----------------> .
22) TestEgressSpan ------------------> .
23) TestNetflowInlineRewrite:
Port 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
----------------------------------------------------------------------------
U U U U U U U U U U U U U U U U U U U U U U U U
Port 25 26 27 28 29 30 31 32
----------------------------
24) TestTrafficStress ---------------> U
25) TestFibTcamSSRAM ----------------> U
26) TestAsicMemory ------------------> U
27) TestLtlFpoeMemoryConsistency ----> .
28) TestAclQosTcam ------------------> U
29) TestNetflowTcam -----------------> U
30) FirmwareDiagStatus --------------> .
31) TestL3HealthMonitoring ----------> .
33) TestRwEngineOverSubscription ----> U
This example shows how to display the detailed online diagnostic results for module 1:
Router# show diagnostic result module 1 detail
Current bootup diagnostic level:minimal
Overall Diagnostic Result for Module 1 :PASS
Diagnostic level at card bootup:minimal
Test results:(. = Pass, F = Fail, U = Untested)
___________________________________________________________________________
1) TestScratchRegister -------------> .
Error code ------------------> 0 (DIAG_SUCCESS)
Total run count -------------> 330
Last test execution time ----> May 12 2003 14:49:36
First test failure time -----> n/a
Last test failure time ------> n/a
Last test pass time ---------> May 12 2003 14:49:36
Total failure count ---------> 0
Consecutive failure count ---> 0
___________________________________________________________________________
2) TestSPRPInbandPing --------------> .
Error code ------------------> 0 (DIAG_SUCCESS)
Total run count -------------> 660
Last test execution time ----> May 12 2003 14:49:38
First test failure time -----> n/a
Last test failure time ------> n/a
Last test pass time ---------> May 12 2003 14:49:38
Total failure count ---------> 0
Consecutive failure count ---> 0
___________________________________________________________________________
Error code ------------------> 0 (DIAG_SUCCESS)
Total run count -------------> 0
Last test execution time ----> n/a
First test failure time -----> n/a
Last test failure time ------> n/a
Last test pass time ---------> n/a
Total failure count ---------> 0
Consecutive failure count ---> 0
________________________________________________________________________
This example shows how to display the output for the health checks performed:
Router# show diagnostic health
CPU utilization for the past 5 mins is greater than 70%
Module 5: WS-SUP720-BASE EARL patch log -
Num. of times patch applied : 0
Num. of times patch requested : 0
Non-zero port counters for 1/8 -
Non-zero port counters for 1/9 -
Current bootup diagnostic level: minimal
Test results: (. = Pass, F = Fail, U = Untested)
36) TestErrorCounterMonitor ---------> F
Error code ------------------> 1 (DIAG_FAILURE)
Total run count -------------> 29
Last test execution time ----> Mar 16 2007 19:04:02
First test failure time -----> Mar 16 2007 19:03:21
Last test failure time ------> Mar 16 2007 19:04:02
Last test pass time ---------> Mar 16 2007 19:03:19
Total failure count ---------> 4
Consecutive failure count ---> 4
Error Records as following.
ID -- Asic Identification
RE -- Register Identification
RM -- Register Identification More
CF -- Consecutive Failure
ID IN PO RE RM DV EG CF TF
---------------------------------------------------------------
26 0 0 338 255 256 2 13 13
26 0 0 344 255 256 2 13 13
26 0 0 358 255 256 2 13 13
System Memory: 524288K total, 353225K used, 171063K free, 1000K kernel reserved
Process kernel, type POSIX, PID = 1
0K total, 0K text, 0K data, 0K stack, 0K dynamic
Process sbin/chkptd.proc, type POSIX, PID = 16386
2296K total, 1988K text, 120K data, 12K stack, 176K dynamic
65536 heapsize, 55356 allocated, 8084 free
Performing Memory Tests
Most online diagnostic tests do not need any special setup or configuration. However, the memory tests, which include the TestFibTcamSSRAM and TestLinecardMemory tests, have some required tasks and some recommended tasks that you should complete before running them.
Before you run any of the online diagnostic memory tests, perform the following tasks:
•Required tasks
–Isolate network traffic by disabling all connected ports.
–Do not send test packets during a memory test.
–Reset the system before returning the system to normal operating mode.
•Turn off all background health-monitoring tests using the no diagnostic monitor module number test all command.
Tip For additional information about Cisco Catalyst 6500 Series Switches (including configuration examples and troubleshooting information), see the documents listed on this page:
http://www.cisco.com/en/US/products/hw/switches/ps708/tsd_products_support_series_home.html
Participate in the Technical Documentation Ideas forum