Guest

^Additional Storage Networking

MDS 9000 Series Fibre Channel Port Link Event "LR Rcvd B2B" Troubleshoot

MDS 9000 Series Fibre Channel Port Link Event "LR Rcvd B2B" Troubleshoot

Document ID: 116400

Updated: Sep 27, 2013

Contributed by Edward Mazurek, Cisco TAC Engineer.

   Print

Introduction

This document describes a problem encountered on Cisco Multilayer Data Switch (MDS) 9000 Series Fibre Channel (FC) ports and provides a solution to the problem.

Problem

This Link Events log displays:

*************** Port Config Link Events Log ***************
----                           ------    -----  -----  ------
Time                        PortNo    Speed  Event  Reason
----                         ------    -----  -----  ------
...
Jul 28 00:46:39 2012  00670297  fc11/25   ---   DOWN   LR Rcvd B2B      

The LR Rcvd B2B (or Link failure Link Reset failed nonempty recv queue) message indicates that the device attached to the port transmits a Link Reset (LR) to the MDS, but the MDS does not respond with a Link Reset Response (LRR) due to internal congestion on the port. The port has packets queued that are received from the attached device, but the MDS cannot deliver them to the appropriate egress port. Since they are still queued at the ingress port, the MDS cannot send back an LRR, and the link fails.

These error messages accompany the previous event log:

%PORT-2-IF_DOWN_LINK_FAILURE: %$VSAN 93%$ 
Interface fc11/25 is down (Link failure)

%PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 100%$
Interface fc5/32 is down (Link failure Link Reset
failed nonempty recv queue
)

Note: This scenario is given under the assumptions that the number of buffer credits that the MDS grants the FC device is three, and that the FC device' packets are switched to the egress FC port.

                MDS
   FC Port                FC Port
   (Egress)   Arbiter    (Ingress)      FC device
   --------   -------    ---------      ---------
 1)                           <------- FC packet 1
 2)             <--- Grant Request
 3)             Grant------------>
 4)    <---------------FC packet 1
 5)                            R_Rdy-------->       Tx B2B=3
 6)                           <------- FC packet 2  Tx B2B=2
  7)             <---- Grant Request
 8)                           <------- FC packet 3  Tx B2B=1
  9)             <---- Grant Request
10)                           <------- FC packet 4  Tx B2B=0
11)             <---- Grant Request
12) Time lapses - Variable depending on attached HBA type
13)                           <--------Link Reset(LR)
14)          Start 90ms "LR Rcvd B2B" timer
15)          "LR Rcvd B2B" timer expires
16)                            <--------NOS-------->

Explanation

This section explains the previous output:

  1. The FC device transmits in an FC packet to the ingress port, destined to the egress port.
  2. The MDS ingress Line Card (LC) port determines the Destination Index (DI), and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
  3. The arbiter sends back a Grant to the ingress port, which gives it permission to transmit FC packet 1 to the egress port through the XBAR.
  4. The ingress LC transmits FC packet 1 through XBAR to the egress port. This makes the ingress buffer available.
  5. The ingress port transmits an R_RDY back to the FC device, which replenishes credit.

    Note: The first five steps are typical when there is no congestion. Assume at this point that the egress port queues are full and cannot receive any more packets.


  6. The FC device transmits FC packet 2 to the ingress port, destined to the egress port.
  7. The MDS ingress LC port determines the DI, and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
  8. The FC device transmits FC packet 3 to the ingress port, destined to the egress port.
  9. The MDS ingress LC port determines the DI, and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
  10. The FC device transmits FC packet 4 to the ingress port, destined to the egress port.
  11. The MDS ingress LC port determines the DI, and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
  12. Time lapses, which varies based on the attached HBA type.
  13. After some time at Tx B2B=0, the FC device initiates Credit Loss Recovery, and transmits a Link Reset (LR).
  14. When the ingress port receives the LR, it checks its ingress buffers and determines that there is at least one packet queued. It then starts a 90 ms LR Rcvd B2B timer.
  15. If the Grants are received, and the three FC packets are transmitted to the egress port, then the LR Rcvd B2B timer is canceled, and a Link Reset Response (LRR) is sent back to the FC device. In this case, however, the egress port remains congested, and the three FC packets remain queued at the ingress port. The  LR Rcvd B2B timer expires, and an LRR is not transmitted back to the FC device.
  16. Both the ingress port and the FC device initiate a link failure via transmission of a Not Operational Sequence.

Solution

If the link failed with an LR Rcvd B2B or a Link failure Link Reset failed nonempty recv queue message, then the port  that failed is not the cause of the slow-drain and was only affected by the slow/stuck port. In order to identify the slow/stuck port that caused the link failure, complete these steps:

  1. Determine if there is more than one link that fails due to the previously-mentioned issue. If more than one link fails at approximately the same time, then the problem might arise because all of the ports attempt to transmit packets to a common egress port.
  2. Check the VSAN zoning database in order to see with which devices the adjacent FC device is zoned. Map these to the egress E or local F ports. In order to map to egress E, ports use the show fspf internal route vsan <vsan> domain <dom> command. In order to map to local F ports, use the show flogi database vsan <vsan> command. If there is more than one link that fails with the LR Rcvd B2B message, then combine the egress E or local F ports found, and check for overlaps. Overlaps are likely causes of slow/stuck ports.
  3. Check the ports found in Step 2 for indications of slow-drain. Examples are:

    • Credit Loss (AK_FCP_CNTR_CREDIT_LOSS / FCP_SW_CNTR_CREDIT_LOSS)
    • 100 ms Tx B2B Zero (AK_FCP_CNTR_TX_WT_AVG_B2B_ZERO / FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO)
    • Timeout Discards (AK_FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES / THB_TMM_TOLB_TIMEOUT_DROP_CNT /  F16_TMM_TOLB_TIMEOUT_DROP_CNT)

  4. If you determine that the slow port is an egress E port, then continue the slow-drain troubleshoot on the adjacent switch indicated by the FSPF next-hop interface.
  5. If you determine that the slow/stuck port is an FCIP link or port-channel, then check the FCIP links for signs of IP retransmissions or other problems, such as link failures. Enter the show ips stats all command in order to check for problems.

Configuration Options

Here are two possible system configuration options:

  • This timer determines how long the system waits before it times-out frames that are not able to transmit. The default is 500 ms.

    system timeout congestion-drop <ms> mode E|F
  • This timer determines the time between the point at which there are zero Tx credits to start frame drops at line rate, untill credits are received.

    system timeout no-credit-drop <ms> mode E|F

Related Information

Updated: Sep 27, 2013
Document ID: 116400