Guest

Cisco Nexus 7000 Series Switches

Nexus 7000 vPC Auto-Recovery Feature Configuration Example

Techzone Article content

Document ID: 116187

Updated: Jun 20, 2013

Contributed by Viral Bhutta, Cisco TAC Engineer.

   Print

Introduction

This document describes how to configure the virtual PortChannel (vPC) auto-recovery feature on the Nexus 7000.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Background Information

Why do we need vPC Auo-Recovery?

There are two main reasons for this vPC enhancement:

  • In a data center outage or power outage, both vPC peers that are comprised of Nexus 7000 switches are off. Occasionally, only one of the peers can be restored. Since the other Nexus 7000 is still off, the vPC peer-link and the vPC peer-keepalive link are also off. In this scenario, the vPC does not come on even for the Nexus 7000 which is already on. All vPC configurations have to be removed from the port-channel on that Nexus 7000 to cause the port-channel to work. When the other Nexus 7000 comes on, you have to again make configuration changes to include the vPC configuration for all the vPCs. In Release 5.0(2) and later, you can configure the reload restore command under the vPC domain configuration to address this problem.
  • For some reason, the vPC peer-link goes off. Since the vPC peer-keepalive is still on, the vPC secondary peer device turns all its vPC member ports off due to dual-active detection. Hence all the traffic goes through the vPC primary switch. For some reason, the vPC primary switch also goes off. This switch problem black holes the traffic since the vPCs on the secondary peer device are still off because it detected dual-active detection before the vPC primary switch went off.

In Release 5.2(1) and later, the vPC auto-recovery feature merges these two enhancements.

Configuration

Configuration of vPC auto-recovery is straightforward. You need to configure auto-recovery under the vPC domain on both vPC peers.

This is an example configuration:

On Switch S1

S1 (config)# vpc domain
S1(config-vpc-domain)# auto-recovery
S1# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link
vPC domain id                     : 1 
Peer status                       : peer adjacency formed ok    
vPC keep-alive status             : peer is alive
Configuration consistency status  : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : primary
Number of vPCs configured         : 5 
Peer Gateway                      : Enabled
Peer gateway excluded VLANs       : -
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Enabled (timeout = 240 seconds)

vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans  
--   ----   ------ --------------------------------------------------
1    Po1    up     1-112,114-120,800,810
                                
vPC status
-----------------------------------------------------------------------
id   Port   Status Consistency Reason                     Active vlans
--   ----   ------ ----------- ------                     ------------
10   Po40   up     success     success                    1-112,114-1
                                                          20,800,810    

On Switch S2

S2 (config)# vpc domain 1
S2(config-vpc-domain)# auto-recovery
S2# show vpc
Legend:
               (*) - local vPC is down, forwarding via vPC peer-link
vPC domain id                     : 1 
Peer status                       : peer adjacency formed ok    
vPC keep-alive status             : peer is alive               
Configuration consistency status  : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : secondary
Number of vPCs configured         : 5 
Peer Gateway                      : Enabled
Peer gateway excluded VLANs       : -
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Enabled (timeout = 240 seconds)

vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans  
--   ----   ------ --------------------------------------------------
1    Po1    up     1-112,114-120,800,810
                                 
vPC status ----------------------------------------------------------------------
id   Port   Status Consistency Reason                     Active vlans
--   ----   ------ ----------- ------                     ------------
40   Po40   up     success     success                    1-112,114-1
                                                          20,800,810    

How does auto-recovery really work?

This section discusses each behavior mentioned in the Background Information section separately. The assumption is that vPC auto-recovery is configured and saved to the startup configuration on both switches S1 and S2.

  1. A power outage shuts off both Nexus 7000 vPC peers simultaneously and only one switch is able to come on.
    116187-configure-vPC-01.jpg
    • S1 and S2 are both on. vPC is formed correctly with peer-link and peer-keepalive on.
    • Both S1 and S2 power off simultaneously.
    • Now only one switch is able to power on. For example, S2 is the only switch which powers on.
    • S2 waits for the vPC auto-recovery timeout (the default is 240 seconds which can be configured with the auto-recovery reload-delay x command, where x is 240-3600 seconds) in order to verify if either vPC peer-link or peer-keepalive status powers on. If any of these links is on (peer-link or peer-keepalive status), auto-recovery is not triggered.
    • After the timeout, if both links are still off (peer-link as well as peer-keepalive status), vPC auto-recovery enables and S2 becomes primary and initiates in order to power on its local vPC. Since there are no peers, the consistency check is bypassed.
    • Now S1 comes on. At this time, S2 retains its primary role and S1 takes a secondary role, a consistency check is performed, and appropriate actions are taken.
  2. vPC peer-link powers off first and then the primary vPC peer powers off.
    116187-configure-vPC-02.jpg
    • S1 and S2 are both on and vPC is formed correctly with peer-link and peer-keepalive on.
    • For some reason, vPC peer-link goes off first.
    • Since vPC peer-keepalive is still on, it detects dual-active detection. The vPC secondary S2 turns off all its local vPCs.
    • Now the vPC primary S1 goes off or reloads.
    • This outage also turns off the vPC peer-keepalive link.
    • S2 waits for three consecutive peer-keepalive messages to be lost. For some reason, either the vPC peer-link comes on or S2 receives a peer-keepalive message, and auto-recovery does not enable.
    • However, if the peer-link remains off and three consecutive peer-keepalive messages are lost, vPC auto-recovery enables.
    • S2 assumes the role of primary and enables its local vPC which bypasses the consistency check.
    • When S1 completes the reload, S2 retains its role of primary and S1 becomes secondary, a consistency check is performed, and appropriate actions are taken.

Note: As explained in both scenarios, the switch which unsuspends its vPC role with vPC auto-recovery continues to remain primary even after peer-link is on. The other peer takes the role of secondary and suspends its own vPC until a consistency check is complete.

For example:

S1 is powered off. S2 becomes the operational primary as expected. Peer-link and peer-keepalive and all vPC links are disconnected from S1. S1 is not powered on. Since S1 is completely isolated, it powers the vPC on (although physical links are down) due to auto-recovery and takes the role of primary. Now, if peer-link or peer-keepalive are connected between S1 and S2, S1 keeps the role of primary and S2 becomes secondary. This configuration causes S2 to suspend its vPC until both vPC peer-link and peer-keepalive are powered on and the consistency check is complete. This scenario causes traffic to black hole since the S2 vPC is secondary and the S1 physical links are off.

Should I enable vPC auto-recovery?

It is a good practice to enable auto-recovery in your vPC environment.

There is a slight chance that the vPC auto-recovery feature might create a dual-active scenario. For example, if you first lost the peer-link and then you lost the peer-keepalive, you will have dual-active scenario.

In this situation, each vPC member port continues to advertise the same Link Aggregation Control Protocol ID that it did before the dual-active failure.

A vPC topology intrinsically protects from loops in case of dual-active scenarios. In a worst case scenario, there are duplicate frames. Despite this, as a loop-prevention mechanism, each switch forwards Bridge Protocol Data Units (BPDUs) with the same BPDU Bridge ID as prior to the vPC dual-active failure.

While not intuitive, it is still possible and desirable to continue to forward traffic from the access layer to the aggregation layer without drops for current traffic flows, provided that the Address Resolution Protocol (ARP) tables are already populated on both Cisco Nexus 7000 Series peers for all needed hosts.

If new MAC addresses need to be learned by the ARP table, issues might arise. The issues arise because the ARP response from the server might be hashed to one Cisco Nexus 7000 Series device and not to the other, which makes it impossible for the traffic to flow correctly.

Suppose, however, that before the failure in the situation just described, traffic was equally distributed to both Cisco Nexus 7000 Series devices by a correct PortChannel and by an Equal Cost Multipath (ECMP) configuration. In this case, server-to-server and client-to-server traffic continues with the caveat that single-attached hosts connected directly to the Cisco Nexus 7000 Series will not be able to communicate (for the lack of the peer link). Also, new MAC addresses learned on one Cisco Nexus 7000 Series cannot be learned on the peer, because this would cause the return traffic that arrives on the peer Cisco Nexus 7000 Series device to flood.

Refer to page 19 of the Cisco NX-OS Software Virtual PortChannel: Fundamental Concepts for more information.

Verify

There is currently no verification procedure available for this configuration.

Troubleshoot

There is currently no specific troubleshooting information available for this configuration.

Related Information

Updated: Jun 20, 2013
Document ID: 116187