interface port-channel1 description vPC Peer-Link switchport mode trunk spanning-tree port type network vpc peer-link
Example vPC leading to UCS Fabirc Interconnect (FI) in this case bdsol-6120-05--A
interface port-channel101 description bdsol-6120-05-A switchport mode trunk spanning-tree port type edge trunk vpc 101
Following test will be performed.
- Data link loss.
- Disruptive upgrade
- In-Service Software Upgrade (ISSU)
- Loss of peer keepalive link - mgmt0 interface in case of this topology/configuration.
- Loss of peer portchannel - Port-channel 1 in this configuration.
- Disabling vPC feature
Basic traffic flow.
A single iperf3 session is used to generate 6.5 gigabits per second of test TCP traffic to verify frame loss during transitions.
RedHat2 is pinned to Fabric Interconnect B while RedHat1 is pinned to fabric interconnect A - this results in traffic which needs to cross the switching portion.
Server: iperf3 -s -i 1
Client iperf3 -c 10.37.9.131 -t 0 -i 1 -w 1M -V
The above parameters were picked to allow high rate of traffic and easy to spot packet loss.
TCP window is clamped to avoid data bursts iperf is know for. Allowing iperf to run unclamped could result in occasional drops in ingress buffers along the path - depending on QoS configuration. The above parameters allow for a sustained rate of 6-7 Gbps without frame loss.
To verify we can check cumulative rate of traffic on interfaces.
The above output shows 7 Gbps of traffic entering on interface Ethernet 1/2 and leaving on interface Ethernet 1/1.
Data link loss
This test is designated to test how data will behave if a link which is part of vPC is shut down.
This example will use Ethernet 1/1, the output interface for data traffic, it will be shut down using command line.
bdsol-n5548-07# conf t Enter configuration commands, one per line. End with CNTL/Z. bdsol-n5548-07(config)# int et1/1 bdsol-n5548-07(config-if)# shut
In this case only a single packet was lost, out of flood of 6.5 Gbps stream.
Traffic is almost immediately balanced among remaining links in portchannel on UCS, in this case using UCS FI B's Ethernet 1/8 (the only remaining) port going up to Nexus 5548 B, from there it will be transported to UCS FI A using Ethernet 1/1.
After performing an ISSU first on the primary and afterwards on the secondary vPC peer no packets have been lost.
This is due to the fact that ISSU all data plane functionality remains undisrupted and only control plane traffic would be affected.
Known Issues with ISSU
Layer 3 features and licenses.
During the ISSU testing a number of issues needed to be resolved. The "show install all impact ..." command may provide output that ISSU cannot be performed with the following explanation: "Non-disruptive install not supported if L3 was enabled." In the testing environment this was due to the LAN_BASE_SERVICES_PKG being in use in the installed license file.
LAN_BASE_SERVICES_PKG includes L3 functionality and in order to perform the ISSU this package must be unused and the license file has to be cleared from the device by using the "clear license LICENSEFILE" command. It is possible that the license file is currently in use by the device. In order to clear such a license file it is important to check which packages are in use by using the "show license usage" and disabling the features of these packages.
Non-edge STP ports
During testing it was also necessary to shutdown the northbound port-channel as it did not pass the "show spanning-tree issu-impact" non-edge, Criteria 3, check and this would have lead to a disruptive upgrade. This northbound port-channel was listed not as a vPC Edge in the "show spanning-tree vlan 1" command.
Loss of peer keepalive link
After the loss of the peer keepalive mgmt0 link no disruption in the traffic was recorded. In this topology, the management interface (mgmt0) is used as keepalive link, hence does not impact the data traffic generated during testing.
The devices notice mgmt0 interface going down, and peer keepalives failing, but since peer link is up data place communication can continue.
2015 Jul 14 12:11:28 bdsol-n5548-07 %IM-5-IM_INTF_STATE: mgmt0 is DOWN in vdc 1 2015 Jul 14 12:11:32 bdsol-n5548-07 %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 75, VPC peer keep-alive receive has failed 2015 Jul 14 12:12:07 bdsol-n5548-07 %IM-5-IM_INTF_STATE: mgmt0 is UP in vdc 1
Disabling vPC feature
This test will describe what happens when vPC is disabled on one of the switches during live data transfer.
VPC feature can be disabled using the following command in the global configuration mode:
bdsol-n5548-07(config)# no feature vpc
Disabling the vPC feature on either primary or secondary vPC peer leads to instant loss of data connectivity. This is due to the peer based nature of vPC. As soon as the feature is disabled, all vPC functionality on the switch ceases to function, the peer link goes down, vPC keepalive status is Suspended and port-channel 101 of the testing environment goes down. This is evident in the show vPC output of the peer switch which still has vPC feature enabled.
bdsol-n5548-07# show vpc Legend: (*) - local vPC is down, forwarding via vPC peer-link
vPC domain id : 75 Peer status : peer link is down vPC keep-alive status : Suspended (Destination IP not reachable) ...
vPC status ---------------------------------------------------------------------------- id Port Status Consistency Reason Active vlans ------ ----------- ------ ----------- -------------------------- ----------- 101 Po101 down success success -
The traffic interruption, as before, is only short lived.
Under above mentioned testing conditions 50-80 packets were lost from a single session.
Remove "feature vpc" command also caused vPC configuration o be removed from port-channels.
This configuration needs to be readded.
vPC feature is intended to bring resiliency performance by splitting data traffic in a port channel among multiple devices.
This simple idea requires complicated control plane implementations.
The above tests were meant to show disruptions to both the control- and data-plane which may occur during life cycle of the feature.
As expected data plane disruptions were detected and corrected almost immediately - with single packets lost in tests.
The control plane disruptions tested show that vPC still maintains sub-second convergence time even when control plane is affected.
The most disruptive test performed - vPC peer link being shut down - potentially combines both data and control plane failure. Still a fast convergence time was demonstrated.