Cisco Nexus 7000 Series NX-OS System Management Configuration Guide, Release 6.x
Configuring Online Diagnostics
Downloads: This chapterpdf (PDF - 301.0KB) The complete bookPDF (PDF - 4.32MB) | Feedback

Table of Contents

Configuring Online Diagnostics

Finding Feature Information

Information About Online Diagnostics

Online Diagnostic Overview

Bootup Diagnostics

Runtime or Health Monitoring Diagnostics

Recovery Actions on Specified Health-Monitoring Diagnostics

On-Demand Diagnostics

High Availability

Virtualization Support

Licensing Requirements for Online Diagnostics

Prerequisites for Online Diagnostics

Guidelines and Limitations

Default Settings

Configuring Online Diagnostics

Setting the Bootup Diagnostic Level

Activating a Diagnostic Test

Setting a Diagnostic Test as Inactive

Configuring Corrective Action

Starting or Stopping an On-Demand Diagnostic Test

Clearing Diagnostic Results

Simulating Diagnostic Results

Verifying the Online Diagnostics Configuration

Configuration Examples for Online Diagnostics

Additional References

Related Documents

Standards

Feature History for Online Diagnostics

Finding Feature Information

Your software release might not support all the features documented in this module. For the latest caveats and feature information, see the Bug Search Tool at https://tools.cisco.com/bugsearch and the release notes for your software release. To find information about the features documented in this module, and to see a list of the releases in which each feature is supported, see the “New and Changed Information” chapter or the Feature History table below.

Information About Online Diagnostics

Online diagnostics help you verify that hardware and internal data paths are operating as designed so that you can rapidly isolate faults.

This section includes the following topics:

Online Diagnostic Overview

With online diagnostics, you can test and verify the hardware functionality of the device while the device is connected to a live network.

The online diagnostics contain tests that check different hardware components and verify the data path and control signals. Disruptive online diagnostic tests (such as the disruptive loopback test) and nondisruptive online diagnostic tests (such as the ASIC register check) run during bootup, line module online insertion and removal (OIR), and system reset. The nondisruptive online diagnostic tests run as part of the background health monitoring, and you can run these tests on demand.

Online diagnostics are categorized as bootup, runtime or health-monitoring diagnostics, and on-demand diagnostics. Bootup diagnostics run during bootup, health-monitoring tests run in the background, and on-demand diagnostics run once or at user-designated intervals when the device is connected to a live network.

Bootup Diagnostics

Bootup diagnostics run during bootup and detect faulty hardware before Cisco NX-OS brings a module online. For example, if you insert a faulty module in the device, bootup diagnostics test the module and take it offline before the device uses the module to forward traffic.

Bootup diagnostics also check the connectivity between the supervisor and module hardware and the data and control paths for all the ASICs. Table 13-1 describes the bootup diagnostic tests for a module and a supervisor.

 

Table 13-1 Bootup Diagnostics

Diagnostic
Description
Module

EOBCPortLoopback

Disruptive test, not an on-demand test

Ethernet out of band

OBFL

Verifies the integrity of the onboard failure logging (OBFL) flash.

FIPS1

Disruptive test; run only when FIPS is enabled on the system

An internal test that runs during module bootup to validate the security device on the module.

BootupPortLoopback2

Disruptive test, not an on-demand test

A PortLoopback test that runs only during module bootup.

Supervisor

USB

Nondisruptive test

Checks the USB controller initialization on a module.

CryptoDevice3

Nondisruptive test

Checks the Cisco Trusted Security (CTS) device initialization on a module.

ManagementPortLoopback

Disruptive test, not an on-demand test

Tests loopback on the management port of a module.

EOBCPortLoopback

Disruptive test, not an on-demand test

Ethernet out of band

OBFL

Verifies the integrity of the onboard failure logging (OBFL) flash.

1.F1 and F2 Series modules do not support the FIPS test.

2.F2 Series modules supports the BootupPortLoopback test since Cisco NX-OS Release 6.0(2). The F3 Series modules do not support the BootupPortLoopback test.

3.The CryptoDevice test is available only for Supervisor 1.

Bootup diagnostics log failures to onboard failure logging (OBFL) and syslog and trigger a diagnostic LED indication (on, off, pass, or fail).

You can configure Cisco NX-OS to either bypass the bootup diagnostics or run the complete set of bootup diagnostics. See the “Setting the Bootup Diagnostic Level” section.

Runtime or Health Monitoring Diagnostics

Runtime diagnostics are also called health monitoring (HM) diagnostics. These diagnostics provide information about the health of a live device. They detect runtime hardware errors, memory errors, the degradation of hardware modules over time, software faults, and resource exhaustion.

Health monitoring diagnostics are nondisruptive and run in the background to ensure the health of a device that is processing live network traffic. You can enable or disable health monitoring tests or change their runtime interval. Table 13-2 describes the health monitoring diagnostics and test IDs for a module and a supervisor.

 

Table 13-2 Health Monitoring Nondisruptive Diagnostics

Diagnostic
Default Interval
Default Setting
Description
Module

ASICRegisterCheck

1 minute

active

Checks read/write access to scratch registers for the ASICs on a module.

PrimaryBootROM

30 minutes

active

Verifies the integrity of the primary boot device on a module.

SecondaryBootROM

30 minutes

active

Verifies the integrity of the secondary boot device on a module.

PortLoopback4

15 minutes

active

Verifies connectivity through every port that is administratively down on every module in the system.

RewriteEngineLoopback5

1 minute

active

Verifies the integrity of the nondisruptive loopback for all ports up to the 1 Engine ASIC device.

SnakeLoopback test6

20 minutes

active

Performs a nondisruptive loopback on all ports, even those ports that are not in the shut state. The ports are formed into a snake during module bootup, and the supervisor checks the snake connectivity periodically.

FIPS7

Not applicable

Not applicable

Runs on CTS-enabled ports when the interface is enabled with a no shut command. This internal test validates the security device on the module.

Supervisor

ASICRegisterCheck

20 seconds

active

Checks read/write access to scratch registers for the ASICs on the supervisor.

NVRAM

5 minutes

active

Verifies the sanity of the NVRAM blocks on a supervisor.

RealTimeClock

5 minutes

active

Verifies that the real-time clock on the supervisor is ticking.

PrimaryBootROM

30 minutes

active

Verifies the integrity of the primary boot device on the supervisor.

SecondaryBootROM

30 minutes

active

Verifies the integrity of the secondary boot device on the supervisor.

CompactFlash

30 minutes

active

Verifies access to the internal compact flash devices.

ExternalCompactFlash

30 minutes

active

Verifies access to the external compact flash devices.

PwrMgmtBus

30 seconds

active

Verifies the standby power management control bus.

SpineControlBus8

30 seconds

active

Verifies the availability of the standby spine module control bus.

SystemMgmtBus

30 seconds

active

Verifies the availability of the standby system management bus.

StatusBus

30 seconds

active

Verifies the status transmitted by the status bus for the supervisor, modules, and fabric cards.

StandbyFabricLoopback

30 seconds

active

Verifies the connectivity of the standby supervisor to the crossbars on the spine card.

PCIeBus

30 seconds

active

Verifies PCIe connectivity from the supervisor to the crossbar ASICs on the fabric cards.

4.The PortLoopback test is supported on all modules except the 48-port 1G copper Ethernet module and the F2 and F2e Series modules. F2 and F2e Series modules utilize a non-disruptive SnakeLoopback test instead.

5.The RewriteEngineLoopback test is deprecated for F1 Series modules in Cisco NX-OS Release 5.2. Beginning with Cisco NX-OS Release 6.1, F2 Series modules support the RewriteEngineLoopback test. The F3 Series modules do not support this test.

6.The SnakeLoopback test is deprecated for F1 Series modules in Cisco NX-OS Release 5.2. Beginning with Cisco NX-OS Release 6.1, F2 Series modules support the SnakeLoopback test. The F3 Series modules do not support this test.

7.F1 Series and F2 Series modules do not support the FIPS test.

8.Beginning with Cisco NX-OS Release 5.2, the SpineControlBus test is enabled by default on the standby supervisor.

Recovery Actions on Specified Health-Monitoring Diagnostics

This feature is disabled by default.

Beginning with Cisco NX-OS Release 6.2(8), you can configure the system to take disruptive action if the system detects failure on one of the following health-monitoring, or runtime, tests:

  • PortLoopback test
  • RewriteEngineLoopback test
  • SnakeLoopback test
  • StandbyFabricLoopback test

Currently these tests do not take corrective recovery actions when they detect a hardware failure. The current default action through EEM includes generating alerts (callhome, syslog) and logging (OBFL, exception logs). These actions are informative, but they do not remove faulty devices from the network, which can lead to network disruption, traffic black holing, and so forth. You must manually shut the devices to recover the network.

You can now configure the system to take disruptive action as a result of repeated failures on these tests. This feature enables or disables the corrective, conservative action on all four tests, simultaneously; the corrective action taken differs for each test. After crossing the maximum consecutive failure count for that test, the system takes corrective action.

The corrective action for each test is as follows:

  • PortLookback test—When you enable this feature, the system moves the port registering faults to an error-disabled state.

Note This applies only to those modules that support the nondisruptive PortLoopbackTest feature.


  • RewriteEngineLookpback test—With this feature enabled, the system takes different corrective action depending on whether the fault is with the supervisor, the fabric, or the port, as follows:

On a chassis with a standby supervisor, when the system detects a fault with the supervisor, the system switches over to the standby supervisor with this feature enabled. If there is no standby supervisor in the chassis, the system does not take any action.

After failures on the fabric, the system will reload the fabric three times. If failure persists, the system powers down the fabric.

After the failures on a port, the system moves the faulty port to the error-disabled state with this feature enabled.

  • SnakeLoopback test—When you enable this feature and the test detects 10 consecutive failures, with any port on the module, the system will move the faulty port to an error-disabled state.
  • StandbyFabricLoopback test—The system attempts to reload the standby supervisor three times after it receives errors on this test. If the system cannot reload the standby supervisor, the system powers off the supervisor.

Finally, the system maintains a history of the recovery actions that includes details of each action, the testing type, and the severity. You can display these counters.

On-Demand Diagnostics

On-demand tests help localize faults and are usually needed in one of the following situations:

  • To respond to an event that has occurred, such as isolating a fault.
  • In anticipation of an event that may occur, such as a resource exceeding its utilization limit.

You can run all the health monitoring tests on demand.

You can schedule on-demand diagnostics to run immediately. See the “Starting or Stopping an On-Demand Diagnostic Test” section for more information.

You can also modify the default interval for a health monitoring test. See the “Activating a Diagnostic Test” section for more information.

High Availability

A key part of high availability is detecting hardware failures and taking corrective action while the device runs in a live network. Online diagnostics in high availability detect hardware failures and provide feedback to high availability software components to make switchover decisions.

Cisco NX-OS supports stateless restarts for online diagnostics. After a reboot or supervisor switchover, Cisco NX-OS applies the running configuration.

Virtualization Support

Cisco NX-OS supports online diagnostics in the default virtual device context (VDC) or, beginning with Cisco NX-OS Release 6.1, in the admin VDC. By default, Cisco NX-OS places you in the default VDC. See the Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide for more information.

Online diagnostics are virtual routing and forwarding (VRF) aware. You can configure online diagnostics to use a particular VRF to reach the online diagnostics SMTP server.

Licensing Requirements for Online Diagnostics

 

Product
License Requirement

Cisco NX-OS

Online diagnostics require no license. Any feature not included in a license package is bundled with the Cisco NX-OS system images and is provided at no extra charge to you. For a complete explanation of the Cisco NX-OS licensing scheme, see the Cisco NX-OS Licensing Guide .

Prerequisites for Online Diagnostics

Online diagnostics have the following prerequisite:

  • If you configure VDCs, install the appropriate license and go to the VDC that you want to configure. See the Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide for configuration information and the Cisco NX-OS Licensing Guide for licensing information.

Guidelines and Limitations

  • You cannot run disruptive online diagnostic tests on demand.
  • The F1 Series modules support the following tests: ASICRegisterCheck, PrimaryBootROM, SecondaryBootROM, EOBCPortLoopback, PortLoopback, and BootupPortLoopback.
  • Support for the RewriteEngineLoopback and SnakeLoopback tests on F1 Series modules is deprecated in Cisco NX-OS Release 5.2.
  • Beginning with Cisco NX-OS Release 6.1, F2 Series modules support the RewriteEngineLoopback and SnakeLoopback tests.

Default Settings

Table 13-3 lists the default settings for online diagnostic parameters.

 

Table 13-3 Default Online Diagnostic Parameters

Parameters
Default

Bootup diagnostics level

complete

Nondisruptive tests

active

Configuring Online Diagnostics

This section includes the following topics:


Note Be aware that the Cisco NX-OS commands for this feature may differ from those commands used in Cisco IOS.


Setting the Bootup Diagnostic Level

You can configure the bootup diagnostics to run the complete set of tests, or you can bypass all bootup diagnostic tests for a faster module bootup time.


Note We recommend that you set the bootup online diagnostics level to complete. We do not recommend bypassing the bootup online diagnostics.


BEFORE YOU BEGIN

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

SUMMARY STEPS

1. config t

2. diagnostic bootup level { complete | bypass }

3. (Optional) show diagnostic bootup level

4. (Optional) copy running-config startup-config

DETAILED STEPS

 

Command
Purpose

Step 1

config t

 

Example:

switch# config t

Enter configuration commands, one per line. End with CNTL/Z.

switch(config)#

Places you in global configuration mode.

Step 2

diagnostic bootup level { complete | bypass }

 

Example:

switch(config)# diagnostic bootup level complete

Configures the bootup diagnostic level to trigger diagnostics as follows when the device boots:

  • complete —Perform all bootup diagnostics. The default is complete.
  • bypass —Do not perform any bootup diagnostics.

Step 3

show diagnostic bootup level

 

Example:

switch(config)# show diagnostic bootup level

(Optional) Displays the bootup diagnostic level (bypass or complete) that is currently in place on the device.

Step 4

copy running-config startup-config

 

Example:

switch(config)# copy running-config startup-config

(Optional) Copies the running configuration to the startup configuration.

Activating a Diagnostic Test

You can set a diagnostic test as active and optionally modify the interval (in hours, minutes, and seconds) at which the test runs.

BEFORE YOU BEGIN

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

SUMMARY STEPS

1. config t

2. (Optional) diagnostic monitor interval module slot test [ test-id | name | all ] hour hour min minutes second sec

3. diagnostic monitor module slot test [ test-id | name | all ]

4. (Optional) show diagnostic content module { slot | all }

DETAILED STEPS

 

Command
Purpose

Step 1

config t

 

Example:

switch# config t

Enter configuration commands, one per line. End with CNTL/Z.

switch(config)#

Places you in global configuration mode.

Step 2

diagnostic monitor interval module slot test [test-id | name | all ] hour hour min minutes second sec

 

Example:

switch(config)# diagnostic monitor interval module 6 test 3 hour 1 min 0 sec 0

(Optional) Configures the interval at which the specified test is run. If no interval is set, the test runs at the interval set previously, or the default interval.

The argument ranges are as follows:

  • slot—The range is from 1 to 10.
  • test-id—The range is from 1 to 14.
  • name—Can be any case-sensitive alphanumeric string up to 32 characters.
  • hour —The range is from 0 to 23 hours.
  • minute—The range is from 0 to 59 minutes.
  • second —The range is from 0 to 59 seconds.

Step 3

diagnostic monitor module slot test [test-id | name | all ]

 

Example:

switch(config)# diagnostic monitor interval module 6 test 3

Activates the specified test.

The argument ranges are as follows:

  • slot—The range is from 1 to 10.
  • test-id—The range is from 1 to 14.
  • name—Can be any case-sensitive alphanumeric string up to 32 characters.

Step 4

show diagnostic content module { slot | all }

 

Example:

switch(config)# show diagnostic content module 6

(Optional) Displays information about the diagnostics and their attributes.

Setting a Diagnostic Test as Inactive

You can set a diagnostic test as inactive. Inactive tests keep their current configuration but do not run at the scheduled interval.

Use the following command in global configuration mode to set a diagnostic test as inactive:

 

Command
Purpose

no diagnostic monitor module slot test [test-id | name | all ]

 

Example:

switch(config)# no diagnostic monitor interval module 6 test 3

Inactivates the specified test.

The argument ranges are as follows:

  • slot —The range is from 1 to 10.
  • test-id —The range is from 1 to 14.
  • name —Can be any case-sensitive alphanumeric string up to 32 characters.

Configuring Corrective Action

You can configure the device to take corrective action when it detects failures on any of the following tests:

  • PortLoopback test
  • RewriteEngineLoopback test
  • SnakeLoopback test
  • StandbyFabricLoopback test

BEFORE YOU BEGIN

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

SUMMARY STEPS

1. config t

2. (Optional) [ no ] diagnostic eem action conservative

DETAILED STEPS

 

Command
Purpose

Step 1

config t

 

Example:

switch# config t

Enter configuration commands, one per line. End with CNTL/Z.

switch(config)#

Places you in global configuration mode.

Step 2

diagnostic eem action conservative

 

Example:

switch# diagnostic eem action conservative

(Optional) Enables or disables corrective actions when the system detects failures on four of the runtime tests.

Use the no form of the command to disable these corrective actions.

Starting or Stopping an On-Demand Diagnostic Test

You can start or stop an on-demand diagnostic test. You can optionally modify the number of iterations to repeat this test, and the action to take if the test fails.

We recommend that you only manually start a disruptive diagnostic test during a scheduled network maintenance time.

BEFORE YOU BEGIN

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

SUMMARY STEPS

1. (Optional) diagnostic ondemand iteration number

2. (Optional) diagnostic ondemand action-on-failure { continue failure-count num-fails | stop }

3. diagnostic start module slot test [ test-id | name | all | non-disruptive ] [ port port-number | all ]

4. diagnostic stop module slot test [ test-id | name | all ]

5. (Optional) show diagnostic status module slot

DETAILED STEPS

 

Command
Purpose

Step 1

diagnostic ondemand iteration number

 

Example:

switch# diagnostic ondemand iteration 5

(Optional) Configures the number of times that the on-demand test runs. The range is from 1 to 999. The default is 1.

Step 2

diagnostic ondemand action-on-failure { continue failure-count num-fails | stop }

 

Example:

switch# diagnostic ondemand action-on-failure stop

(Optional) Configures the action to take if the on-demand test fails. The num-fails range is from 1 to 999. The default is 1.

Step 3

diagnostic start module slot test [test-id | name | all | non-disruptive ] [ port port-number | all ]

 

Example:

switch# diagnostic start module 6 test all

Starts one or more diagnostic tests on a module. The module slot range is from 1 to 10. The test-id range is from 1 to 14. The test name can be any case-sensitive alphanumeric string up to 32 characters. The port range is from 1 to 48.

Step 4

diagnostic stop module slot test [test-id | name | all ]

 

Example:

switch# diagnostic stop module 6 test all

Stops one or more diagnostic tests on a module. The module slot range is from 1 to 10. The test-id range is from 1 to 14. The test name can be any case-sensitive alphanumeric string up to 32 characters.

Step 5

show diagnostic status module slot

 

Example:

switch# show diagnostic status module 6

(Optional) Verifies that the diagnostic has been scheduled.

Clearing Diagnostic Results

You can clear diagnostic test results.

Use the following command in any mode to clear the diagnostic test results:

 

Command
Purpose

diagnostic clear result module [ slot | all ] test {test-id | all }

 

Example:

switch# diagnostic clear result module 2 test all

Clears the test result for the specified test.

The argument ranges are as follows:

  • slot —The range is from 1 to 10.
  • test-id —The range is from 1 to 14.

Simulating Diagnostic Results

You can simulate a diagnostic test result.

Use the following command in any mode to simulate a diagnostic test result:

 

Command
Purpose

diagnostic test simulation module slot test test-id {fail | random-fail | success} [ port number | all ]

 

Example:

switch# diagnostic test simulation module 2 test 2 fail

Simulates a test result. The test-id range is from 1 to 14. The port range is from 1 to 48.

Use the following command in any mode to clear the simulated diagnostic test result:

 

Command
Purpose

diagnostic test simulation module slot test test-id clear

 

Example:

switch# diagnostic test simulation module 2 test 2 clear

Clears the simulated test result. The test-id range is from 1 to 14.

Verifying the Online Diagnostics Configuration

To display online diagnostics configuration information, perform one of the following tasks:

 

Command
Purpose

show diagnostic bootup level

Displays information about bootup diagnostics.

show diagnostic content module { slot | all }

Displays information about diagnostic test content for a module.

show diagnostic description module slot test [ test-name | all ]

Displays the diagnostic description.

show diagnostic events [ error | info ]

Displays diagnostic events by error and information event type.

show diagnostic ondemand setting

Displays information about on-demand diagnostics.

show diagnostic result module slot [ test [ test-name | all ]] [ detail ]

Displays information about the results of a diagnostic.

show diagnostic simulation module slot

Displays information about a simulated diagnostic.

show diagnostic status module slot

Displays the test status for all tests on a module.

show hardware capacity [eobc | fabric-utilization | forwarding | interface | module | power]

Displays information about the hardware capabilities and current hardware utilization by the system.

show module

Displays module information including the online diagnostic test status.

show diagnostic eem action history

Displays recovery action history including the number of switchovers, reloads, and power offs, as well as timestamps, failure reason, module number, port list, test name, testing type, and severity. This data is maintained across ungraceful reloads.

show event manager events action-log event-type [ gold | gold_sup_failure | g old_fabric_failure | gold_module_failure | gold_port_failure ]

Displays the recovery action history including the number of switchovers/reloads/poweroffs, timestamp, failure reason, module-id, port list, test name, testing type, and severity. This data is maintained across ungraceful reloads.

Configuration Examples for Online Diagnostics

This example shows how to start all on-demand tests on module 6:

diagnostic start module 6 test all

This example shows how to activate test 2 and set the test interval on module 6:

conf t

diagnostic monitor module 6 test 2

diagnostic monitor interval module 6 test 2 hour 3 min 30 sec 0

Additional References

For additional information related to implementing online diagnostics, see the following sections:

Related Documents

Related Topic
Document Title

Online diagnostics CLI commands

Cisco Nexus 7000 Series NX-OS System Management Command Reference

VDCs and VRFs

Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide

Standards

Standards
Title

No new or modified standards are supported by this feature, and support for existing standards has not been modified by this feature.

Feature History for Online Diagnostics

Table 13-4 lists the release history for this feature.

 

Table 13-4 Feature History for Online Diagnostics

Feature Name
Releases
Feature Information

Online diagnostics (GOLD)

6.2(8)

Ability to configure recovery actions for four diagnostic tests.

Online diagnostics (GOLD)

6.1(1)

Added support for Supervisor 2 and M2 Series modules.

Online diagnostics (GOLD)

6.1(1)

Added support for F2 Series modules for the RewriteEngineLoopback and SnakeLoopback tests.

Online diagnostics (GOLD)

6.1(1)

Added support for configuring online diagnostics in the admin VDC.

Online diagnostics (GOLD)

6.0(1)

Added support for F2 Series modules.

Online diagnostics (GOLD)

5.2(1)

Enabled the SpineControlBus test on the standby supervisor.

Online diagnostics (GOLD)

5.2(1)

Deprecated the SnakeLoopback test on F1 Series modules.

Online diagnostics (GOLD)

5.1(2)

Added support for the SnakeLoopback test on F1 Series modules.

Online diagnostics (GOLD)

5.1(1)

Added support for the FIPS and BootupPortLoopback tests.

Online diagnostics (GOLD)

4.2(1)

Added support for the PortLoopback, StatusBus, and StandbyFabricLoopback tests.

Online diagnostics (GOLD)

4.0(1)

This feature was introduced.