Guest

Cisco Network Modules

Nexus 7000 Series M132XP-12 Module Troubleshooting Based on Error Logs

Document ID: 116227

Updated: Oct 21, 2013

Contributed by Yogesh Ramdoss, Robert Hurst, Vincent Ng, Cisco TAC Engineers.

   Print

Introduction

This document describes the process that is used in order to determine if a Cisco Nexus 7000 Series (N7K) M132XP-12 or a N7K-M132XP-12L module needs to be sent for Return Material Authorization (RMA).

Prerequisites

Requirements

Cisco recommends that you have knowledge of the Nexus operating system CLI.

Components Used

The information in this document is based on the N7K M132XP-12 Linecard.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Background Information

In the case of suspected hardware failure on the N7K-M132XP-12 module, the cause could be from a software defect where an RMA is not required.

This document lists the symptoms experienced, and provides the troubleshooting steps required in order to determine the health of the module.

Scenario 1: N7K-M132XP-12 Diagnostic "Port Loopback" Test Failed

Symptoms

The module experiences diagnostic failure, and this syslog is observed:
%DIAG_PORT_LB-2-PORTLOOPBACK_TEST_FAIL: Module:18 Test:
PortLoopback failed 10 consecutive times. Faulty module:
Module 18 affected ports:23 Error:Loopback test failed.
Packets lost on the LC at the Queueing engine ASIC

N7k# show diagnostic result module 18


Current bootup diagnostic level: complete
Module 18: 10 Gbps Ethernet Module


        Test results: (. = Pass, F = Fail, I = Incomplete,
        U = Untested, A = Abort, E = Error disabled)


         1) EOBCPortLoopback--------------> .
         2) ASICRegisterCheck-------------> E
         3) PrimaryBootROM----------------> .
         4) SecondaryBootROM--------------> .
         5) PortLoopback:


          Port   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
          -----------------------------------------------------
                 U  U  I  I  I  I  I  I  U  U  I  .  I  .  I  .


          Port  17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
          -----------------------------------------------------
                 U  U  .  .  U  U  E  .  U  U  I  I  I  I  I  I


         6) RewriteEngineLoopback:


          Port   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
          -----------------------------------------------------
                 .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .


          Port  17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
          -----------------------------------------------------
                 .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .


"show module"
N7k# show module
Mod  Ports  Module-Type                      Model              Status
---  -----  -------------------------------- ------------------ ------------
16   32     10 Gbps Ethernet Module          N7K-M132XP-12      ok
17   32     10 Gbps Ethernet Module          N7K-M132XP-12      ok
18   32     10 Gbps Ethernet Module          N7K-M132XP-12      ok

        
Mod  Sw              Hw
---  --------------  ------
16   4.2(6E5)        2.0    
17   4.2(6E5)        1.7    
18   4.2(6E5)        1.7    


Mod  MAC-Address(es)                         Serial-Num
---  --------------------------------------  ----------
16   50-3d-e5-b8-5e-10 to 50-3d-e5-b8-5e-34  JAF1504CPAR
17   88-43-e1-c7-0b-90 to 88-43-e1-c7-0b-b4  JAF1405BJLJ
18   88-43-e1-c7-60-c0 to 88-43-e1-c7-60-e4  JAF1405CLML


Mod  Online Diag Status
---  ------------------
16   Fail
17   Pass
18   Fail

Checklist

This scenario is likely due to Cisco Bug ID CSCtn81109 or CSCti95293.

In order to verify that the problem is caused by software defect or by actual hardware failure that requires RMA, complete these steps:

  1. Check to see if the NX-OS version matches with the Distributed Defect Tracking System (DDTS) version. Both DDTS are fixed and verified in Version 5.2.4.

  2. Enter the show log command when the diagnostic message is observed in order to view the time stamp of the diagnostic test failure. Determine if there are any CPU issues that occurred near the same time. Sometimes when the CPU is overwhelmed, it causes the diagnostic port loopback test to fail. This is a good data point to collect even though it might not be the cause of the problem.

  3. Collect additional CLI data with these commands:

    tac-pac bootflash:tech.txt
    show tech module 1
    show tech gold
    show hardware internal errors module 1 | diff - issue this a few times
  4. Clear the diagnostic result and rerun it while the CPU is not overwhelmed with these commands:

    # show diagnostic result module 1
    # diagnostic clear result module all
    (config)# no diagnostic monitor module 1 test 5

    Note: You might need to check the test number in order to ensure that it is the PortLoopback test. The 5.x base code could be test 5, whereas the 6.0 base code could be test 6.


    (config)# diagnostic monitor module 1 test 5
    # diagnostic start module 1 test 5
    # show diagnostic result module 1 test 5

    Note: It could take a few minutes before the test is completed.


    # show module internal exceptionlog module 1
    # show module internal event-history errors
    # show hardware internal errors module 1

    If the module is recovered and the diagnostic test passes, it is likely that this is due to the DDTS mentioned above, because actual hardware failure should fail diagnostics consistently.

    Note: If the module fails the diagnostic test consistently, you might have an actual hardware failure, so contact the Cisco Technical Assistance Center (TAC) for further help.

Scenario 2: M1 Modules Get Reset and/or Link Flaps

Symptoms

N7k %$ VDC-1 %$ %DIAG_PORT_LB-2-PORTLOOPBACK_TEST_FAIL: Module:3
Test:PortLoopback failed 10 consecutive times. Faulty module:
affected ports:3,5,7,11,13,15,19,21,23,27,29,31  Error:Loopback test failed.
Packets lost on the LC at the MAC ASIC

N7k %$ VDC-1 %$ %DIAG_PORT_LB-2-PORTLOOPBACK_TEST_FAIL:  Module:3
Test:PortLoopback failed 10 consecutive times. Faulty module:
affected ports:4,6,8,12,14,16,20,22,24,26,28,30,32  Error:Loopback test failed.
Packets lost on the LC at the Queueing engine ASIC

Checklist

This problem is likely due to Cisco Bug ID CSCtt43115. It is NOT a hardware failure, and no replacement is required.

Collect all the logs reported and the sequence of events that occurred.

show tech detail
show accounting log
show logging

Ensure that the configurations, specifically Switched Port Analyzer (SPAN), and symptoms match those mentioned in the DDTS Release Notes enclosure.

Note: This issue applies to all M1 module types.

Scenario 3: All M1 Modules Fail Specific Diagnostic Tests, Like the PortLoopback or RewriteEngineLoopback Tests

Symptoms

This issue happens when there is an issue between the Active Supervisor (Sup) engine and the Xbar module, which results in corruption of the diagnostic packet. The N7K switch might report that multiple/all ports in multiple/all modules fail these tests.

This issue requires manual investigation and isolation of the faulty Sup engine.

The condition that caused the tests to go into the errdisabled state might be transient. Cisco recommends that you run the tests on-demand in order to determine if the condition persists.

In order to clear the ErrDisabled state of the test, enter:

N7K# diagnostic clear result module 1 test ?
  <1-6>  Test ID(s)
  all    Select all

In order to run the on-demand test, enter:

N7K# diagnostic start module <mod#> test <test#>

In order to stop the test, enter:

N7K# diagnostic stop module <mod#> test <test#>

As a corrective action, the Sup engine does not trigger failover or reset in order to recover from this condtion. In order to request corrective action, an enhancement request has been filed: Cisco Bug ID CSCth03474 - n7k/GOLD:Improve Fault Isolation of N7K-GOLD.

Related Information

Updated: Oct 21, 2013
Document ID: 116227