This document describes what to do if you experience input discards on Fibre Channel over Ethernet (FCoE) Multihop interfaces. This problem/solution document is useful when discard symptoms are identified on interfaces that interconnect remote datacenters.
This example depicts a real-life scenario of this issue.
The topology shown in the example depicts two datacenters separated by 10KM. There is a 10KM FCoE Virtual Expansion (VE) (multihop) interface that connects DC1 and DC2. The multihop interfaces are configured on N7K-F132XP-15 linecards. Per this F1 series datasheet, this should have been within the supported range.
Initially, the datasheet indicated these IEEE Data Center Bridging (DCB) features:
Cisco bug ID CSCts72420 was modified in order to address the documentation. The line in regards to the lossless link distance of 20KM was removed.
The EMC VPLEX devices support a storage replication feature. This scenario used synchronous replication. When the EMC VPLEX devices were upgraded they became 'out of sync'. Post VPLEX upgrade, the devices began to replicate high amounts of data over the 10KM FCoE multihop link.
When data replication increased, these events transpired:
These events are a high level view of expected FCoE flow control behavior. The pause frames received from Nexus 5000-DC2 indicate congestions on an end-device. As ingress buffers begin to fill, pause frames trickle back into the fabric.
The issue in this scenario is that the Nexus 7000-DC2 constantly discarded packets on ingress over the 10KM mulithop link.
Ethernet4/1 is up
Dedicated Interface
Hardware: 1000/10000 Ethernet, address: XXXX.XXXX.XXXX (bia XXXX.XXXX.XXXX)
MTU bytes (CoS values): 9216(0-2,4-7) 2112(3)
BW 10000000 Kbit, DLY 10 usec, reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA
Port mode is trunk
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on
Input flow-control is off, output flow-control is off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
Last link flapped 25week(s) 0day(s)
Last clearing of "show interface" counters 79w2d
30 seconds input rate 296186536 bits/sec, 27891 packets/sec
30 seconds output rate 151677360 bits/sec, 19294 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 289.58 Mbps, 27.61 Kpps; output rate 165.20 Mbps, 20.05 Kpps
RX
566235497816 unicast packets 2504479 multicast packets 0 broadcast packets
566239834433 input packets 502487779153524 bytes
219280594774 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 19312516 input discard
1832141 Rx pause
TX
681040135255 unicast packets 2504251 multicast packets 0 broadcast packets
681046392756 output packets 744942450903588 bytes
333793360248 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
3753250 Tx pause
5 interface resets
This should not happen as the above interface only carries FCoE (CoS 3) traffic. Input discards violate the 'no-drop' QoS policy for FCoE. Furthermore, discards in an FCoE environment could lead to SCSI aborts, errors, and so on.
When a device sends pause, the interface that generates the pause frame should have an ingress queue with a buffer space large enough to buffer two times the link distance. This is because at the time that the pause is generated the wire might be full. By time the adjacent device receives/processes the generated pause frame, the wire might be full again. Thus the device that generates the pause should have the ability to buffer two times the link distance.
Upon calculation, there could have been 100+ packets in flight over the 10KM link. Due to an ASIC limitation, the F1 series linecard cannot support lossless FCoE on a 10KM link or greater.
Cisco bug ID CSCua10484 addressed F2 long haul lossless distance support. In NX-OS Release 6.1(2) and later, these configuration changes are allowed.
Space left in the IB to catch packets can be calculated as: PL_STOP - PL_PAUSE. By default the PL_STOP and HWM (PL_PAUSE) values are the same.
module-4# show hardware internal mac port 1 qos configuration | begin IB | end EB
IB
Port page limit : 3584 (1376256 Bytes)
VL# HWM pages(bytes) LWM pages(bytes) Used PL_STOP(HWM & LWM)
pages THR
0 1107 ( 425088) 1059 ( 406656) 0 1107 1059
1 2 ( 768) 1 ( 384) 0 2 1
2 1107 ( 425088) 1059 ( 406656) 0 1107 1059
3 1053 ( 404352) 1029 ( 395136) 0 1053 1029
4 2 ( 768) 1 ( 384) 0 2 1
5 231 ( 88704) 159 ( 61056) 0 231 159
6 2 ( 768) 1 ( 384) 0 2 1
7 2 ( 768) 1 ( 384) 0 2 1
Credited DWRR WT: 216 (0xd8) Uncredited DWRR WT: 144 (0x90)
DWRR honor UC = FALSE
Leak Lo weight = 0xd8, enabled = FALSE
EB
You can modify these values in order to support a greater distance by the allocation of greater buffers to the no-drop Class of Service (CoS). In order to complete this, duplicate the 'default-4q-7e-in-policy' Quality of Service (QoS) policy-map.
In Default and Storage VDC
Switch(config)# qos copy policy-map type queuing ?
*** No matching command found in current mode, matching in (exec) mode ***
default-4q-7e-in-policy Default 7-ethernet input queuing policy
default-4q-7e-out-policy Default 7-ethernet output queuing policy
Switch(config)# qos copy policy-map type queuing default-4q-7e-in-policy prefix 7I_
After the policy is copied in both the default VDC and the storage VDC, modify the '4q-7e-in' policy-map in order to allocate a greater queue-limit percentage to the no-drop COS.
In Default and Storage VDC
Switch(config)# show run ipqos
<snippet>
policy-map type queuing 7I_4q-7e-in
class type queuing c-4q-7e-drop-in
service-policy type queuing 7I_4q-7e-drop-in
queue-limit percent 1 <<<<<<<<<<<<<<<<<
class type queuing c-4q-7e-ndrop-in
service-policy type queuing 7I_4q-7e-ndrop-in
queue-limit percent 99 <<<<<<<<<<<<<<<<<
Now, apply the modified QoS policy to the desired interface:
In Storage VDC
Switch(config)# int e4/1
Switch(config-if)# service-policy type queuing input 7I_4q-7e-in
Switch(config-if)# show run int e4/1
!Command: show running-config interface Ethernet4/1
!Time: Sun Mar 2 21:03:07 2014
version 6.1(4)
interface Ethernet4/1
switchport
switchport mode trunk
switchport trunk allowed vlan 1,2990
load-interval counter 2 30
service-policy type queuing input 7I_4q-7e-in
no shutdown
Now, notice the PL_STOP value is greater than the High Water Mark ( HWM). Thus, a greater buffering capability is allowed for IB.
module-4# show hardware internal mac port 1 qos configuration | begin IB | end EB
IB
Port page limit : 3584 (1376256 Bytes)
VL# HWM pages(bytes) LWM pages(bytes) Used PL_STOP(HWM & LWM)
pages THR
0 15 ( 5760) 9 ( 3456) 0 15 9
1 2 ( 768) 1 ( 384) 0 2 1
2 15 ( 5760) 9 ( 3456) 0 15 9
3 1161 ( 445824) 1137 ( 436608) 0 3521 1137
4 2 ( 768) 1 ( 384) 0 2 1
5 3 ( 1152) 0 ( 0) 0 3 0
6 2 ( 768) 1 ( 384) 0 2 1
7 2 ( 768) 1 ( 384) 0 2 1
Credited DWRR WT: 216 (0xd8) Uncredited DWRR WT: 144 (0x90)
DWRR honor UC = FALSE
Leak Lo weight = 0xd8, enabled = FALSE
EB
In the example, the space left in IB = 3521 pages - 1161 pages = 2360 pages => 906,240 bytes.
Or
If available, use native Fibre Channel (FC) between sites. This solution requires either Coarse Wavelength Divison Multiplexer/Dense Wavelength Divison Multiplexer (CWDM/DWDM) intervention or dark fibre, dependent upon the required distance.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
17-Jul-2014 |
Initial Release |