This document describes the SpineControlBus test and provides an action to take when the test fails.
The SpineControlBus test is a diagnostic test that checks the standby control bus connectivity from the Spine card to the Supervisor card. The Spine card is also known by other names such as Xbar or Fabric. There are two control buses from each of the Supervisor modules to every Spine card. Only one of them is used, while the other is kept as a backup in case the primary fails.
This is a non-disruptive test. This test is auto-disabled after 20 consecutive failures. Failure of this test is not considered catastrophic, but it is an indication of 'reduced' high-availability for that Supervisor-Spinecard pair.
What is the recommended action to take when the SpineControlBus test fails?
Rule out Cisco bug ID CSCuc72466 - Spine Control Bus fail in both active and standby.
The SpineControlBus accesses the scratch register in order to test active and standby access to the Spine card in order to determine if the spine works. However, that access can only be done one at a time. When both active and standby run the test at the same time, one of the tests (usually the standby test) fails. The failure is a false alarm and not an indication of an actual hardware failure.
Apply this workaround in order to ensure the test is not executed by the active and standby supervisor at the same time:
N7K(config)# diagnostic monitor interval module <supervisor_slot_number>
test SpineControlBus hour 0 min 0 second 31
N7K# diagnostic clear result module <supervisor_slot_number> test 11
Enter the show diagnostic content module X command in order to determine the test ID of SpineControlBus.
Another workaround is to disable the test on the standby supervisor.
Continue to monitor the test with the show diagnostic result module X test SpineControlBus detail command.
Cisco bug ID CSCuc72466 is fixed in NX-OS Release 6.2
Once the above bug has been ruled out, and the failure occurs again, take these actions:
If the test has failed multiple times, this could indicate a faulty supervisor. Try a supervisor switchover in the case where the active supervisor fails the test or reload the standby supervisor if the standby supervisor fails the test in order to see if the problem clears. If so, replace the supervisor. Otherwise, although not common, this could indicate an issue with multiple Spine cards or multiple bus failures. If a single Spine card has failed, insert a Spine card into a different known good slot to see if the problem follows the Spine card. If it does, replace the Spine card. Otherwise, this indicates a problem with the bus and the chassis should be replaced.Example
Nexus7000# show diagnostic result module 5 test SpineControlBus detail
Module 5: Supervisor module-1X (Active)
11) SpineControlBus E
Error code ------------------> DIAG TEST ERR DISABLE
Total run count -------------> 676018
Last test execution time ----> Tue May 14 18:30:47 2013
First test failure time -----> Sat Oct 13 17:55:06 2012
Last test failure time ------> Tue May 14 18:30:47 2013
Last test pass time ---------> Tue May 14 18:30:17 2013
Total failure count ---------> 30
Consecutive failure count ---> 1
Last failure reason ---------> Spine control test failed
Next Execution time ---------> Tue May 14 18:31:17 2013
XBar 1 2 3
F F F