Guest

Cisco Nexus 5000 Series Switches

Cisco FabricPath Design Guide: Using FabricPath with an Aggregation and Access Topology

  • Viewing Options

  • PDF (2.1 MB)
  • Feedback

Contents

Introduction

Hardware and Software Support

Next-Generation Data Center Architecture

FabricPath Technology Overview

Spine and Edge

Switch-ids

Conversational Learning

VLAN Trunking

Metrics

Multipath Load Balancing

Multidestination Trees

Multicast Forwarding

vPC+

Design Considerations for the Spine/Aggregation Layer

Topology Considerations

Declaring VLANs as FabricPath VLANs

Declaring Ports as FabricPath Ports

Topology 0

Unicast Layer 2 Multipathing

Building a Routed Spine

F1 and M1 Card Cooperation

M1 Card Routing Traffic

F1 Card Architecture and MAC Optimization

MAC Scalability Considerations

Gateway Routing

Active/Active Gateway Routing

Benefits of vPC+ at the Spine

"Dual-Active Exclude" (Peer-Link "Failure")

Connecting FabricPath Edge or Leaf Layer to the Spine Layer

Avoiding Flooding by Tuning ARP and Layer 2 Table

Multicast Routing Considerations

Multicast Forwarding in FabricPath

Routed Multicast Configuration

Routed Multicast with vPC+

Summary Recommendations

Design Considerations for the Edge and Access Layer

FabricPath VLANs

MAC Learning at the Access Layer

Learning the HSRP MAC Address

Direct Connectivity between Edge Switches

vPC+ at the Edge

Integration with FEX

Summary

FabricPath Scalability Considerations

FabricPath Convergence Times

Sample Configurations

Cisco Nexus 7000 Series Spine 1 (DC1-Agg1)

Cisco Nexus 7000 Series Spine 2 (DC1-Agg2)

Edge Cisco Nexus 5500 Switch (DC1-5500-1) without vPC+

Edge Cisco Nexus 5500 Switch (DC1-5500-1) with vPC+ and FEX A/A

Edge Cisco Nexus 5500 Switch (DC1-5500-2) with vPC+ and FEX A/A


Introduction

The introduction of Cisco ® FabricPath technology in Cisco NX-OS Software Release 5.1(3) brings the benefit of routing protocols to Layer 2 network Ethernet environments. Cisco FabricPath technology is described in several documents available on the cisco.com page:
The advantages of using FabricPath include:

MAC address scalability with conversational learning

Spanning Tree Protocol independence: No reliance on Spanning Tree anymore. Each switch has a complete view of the Layer 2 topology, and it calculates the Layer 2 forwarding table based a shortest-path-first calculation.

Traffic distribution for unicast: Unicast Layer 2 traffic can take multiple equal-cost Layer 2 paths.

Traffic distribution for multicast on multiple distribution trees: Multicast traffic can be distributed along two multidestination trees.

More direct communication paths: Any topology is possible, so cabling two access or edge switches directly to each other creates a direct communication path, unlike what happens with Spanning Tree.

Simplicity of configuration: The configuration of FabricPath is very simple. Tuning may still be advised but by default the switches negotiate switch-ids allocation with a protocol called Dynamic Resource Allocation Protocol (DRAP).

Loop mitigation with TTL in the frame field: Layer 2 loops, as they are known of today in Spanning-Tree-Protocol-based Layer 2 networks, are mitigated by dropping frames that have been propagated across too many hops. The Layer 2 FabricPath frames include a Time to Live (TTL) field that is decremented at each hop. The value of the TTL field is 32.

This document describes one of many possible designs that are enabled by Cisco FabricPath, and it includes tuning recommendations and sample configurations.

Hardware and Software Support

The solution described in this document requires the following hardware and software:

• Cisco Nexus® 5500 products as the edge (or access) device in a FabricPath topology with NX-OS Software Release 5.1(3)N1(1). Note that in the context of FabricPath domain, the edge device is also known as the leaf device.

• Cisco Nexus 7000 Series products as the spine (or aggregation) device in a FabricPath topology with NX-OS 5.1(1) Release or later. At the time of this writing, the Cisco Nexus 7000 Series configuration that enables FabricPath is based on the F1 series cards, which consist of 32 ports 1/10 Gbps. The Layer 3 functionality is provided by the use of M1 series cards.

• Cisco Nexus Fabric Extenders (FEX) platforms are supported with the FEX ports acting as FabricPath edge ports. FEX can be used to augment the number of edge ports or Classic Ethernet (CE) ports without adding configuration points.

• Proper licensing: You need the ENHANCED_LAYER2_PKG license installed. For details, visit: http://www.cisco.com/en/US/docs/switches/datacenter/sw/nx-os/licensing/guide/Cisco_NX-OS_Licensing_Guide_chapter1.html.

Note: The F1 cards are complemented by M1 card for routing purposes. When using M1 cards in the same virtual device context (VDC) as the F1 card, routing is offloaded to the M1 cards, and more routing capacity is added to the F1 card by putting more M1 ports into the same VDC as the F1 card.

This design guide refers to the following FabricPath features:

• FabricPath Shortest-Path-First-based forwarding

• FabricPath Layer 2 multipathing (Unicast traffic)

• FabricPath multidestination trees load balancing (unknown unicast, broadcast, and multicast traffic)

• Active/Active gateway routing using vPC+ configuration

• vPC+ at the edge switch for the purpose of connecting dual-homed servers or dual-homed Cisco Fabric Extenders

Next-Generation Data Center Architecture

The introduction of FabricPath enables a variety of Layer 2 topologies that were not possible before. Figure 1 illustrates some of them.

Figure 1. FabricPath enables new topologies

On the top-left of Figure 1 (topology a), you see a design where two spines connect to a set of edge devices, which looks more like an aggregation/access design. However, in contrast to designs based on Spanning Tree Protocol, any two endpoints can take all available paths via either spine.
The second design, topology b at the top-right, illustrates a topology where servers connect to edge devices in vPC mode, which in the case of FabricPath is called vPC+.
The left-bottom design, topology c, illustrates the ability for FabricPath to use multiple spines instead of just two. This gives a very granular control on the available bandwidth: the more spines you add, the lower oversubscription between the edge and the rest of the network.
The bottom-right design, topology d, illustrates the ability for FabricPath to provide direct connectivity between edge devices in case direct communication between any edge devices is suitable. Most of these topologies would not make sense with Spanning Tree, but they are almost plug-and-play topologies in a FabricPath environment.

Note: Direct connectivity between the spine devices is not strictly necessary. It depends on where the routing is performed and on whether you consider it necessary to establish a direct communication path between spines, whether spines are just used for fabric switching or are used for end node attachment too, and so on.

Figure 2 illustrates the traffic paths that any two given servers can take across a Cisco FabricPath topology:

Figure 2. Traffic paths in FabricPath designs

The Layer 3 demarcation point can be placed in several ways, but a very simple deployment model is to place the Layer 3 boundary at the aggregation point where all edge or access switches are aggregated. Figure 3 illustrates adding the routing function at the spine layer for the FabricPath cloud to connect to the rest of the network infrastructure:

Figure 3. Placing the Router at the Spine Layer

When placing the routing at the spine, the spine keeps switching Layer 2 traffic based on the switch-id, but it also becomes an edge for routed traffic. Yet another option to place the routing function is to connect the routers as an edge device to the spine. This approach ensures that the spine switches traffic exclusively based on switch-id, and it makes changes to the spine configuration very rare. Figure 14 illustrates this design. Analyzing this last design option is outside of the scope of this document.
In this design guide, the focus is on a topology that uses two spines and multiple edges. The reason for this is that it's an easy transition for customers who are deploying Spanning Tree today and want to benefit from FabricPath without having to rewire their data center. In addition, it is easy to add additional spines in case you want to lower the oversubscription ratio, without any substantial change to the design considerations that are the subject of this document.
Figure 4 describes the topology used in this design guide:

Figure 4. Topology used in this design guide

If you want, you can add ports to this design by adding Cisco Fabric Extenders (FEX), as shown in Figure 5:

Figure 5. Adding FEX to a FabricPath topology

FabricPath Technology Overview

This section provides an overview of FabricPath but it is not meant to be a tutorial on the technology. Several other white papers already exist on the Cisco public webpage that cover this topic.
The reason that FabricPath is almost a plug-and-play protocol, in contrast to several other technologies available in the market, is that it enables each device to build an overall view of the topology in a way that is similar to Open Shortest Path First (OSPF) protocol and other link state protocols.
Each device in the topology is identified by a switch-id and all Layer 2 forwarding tables are built based on reachability to each switch-id (as opposed to reachability to a MAC address). The switch-id is dynamically assigned via the Dynamic Resource Allocation Protocol (DRAP), so no manual allocation of switch-ids is required. However, it is possible to manually assign a SW-ID while a dynamic check for misconfiguration is performed (as discussed in the "Switch-ids" section).
FabricPath multipath forwarding relies on building a view of the network using a Link State Routing Protocol similar to the one used in Layer 3 networks. This is achieved with a Layer 2 protocol based on Intermediate-System-to-Intermediate-System (IS-IS) routing, which doesn't require any Layer 3 capability on the FabricPath switching devices.

Spine and Edge

In FabricPath topologies, there are two types of "functions" (which can be performed by all FabricPath hardware):

• Edge (or leaf) devices: These devices have ports connected to Classic Ethernet devices (servers, firewalls, router ports, and so on) and ports connected to the FabricPath cloud (or FabricPath ports). Edge devices are able to map a MAC address to the destination switch-id.

• Spine devices: These devices exclusively interconnect edge devices. Spine devices switch exclusively based on the destination switch-id.

Figure 6. Classic Ethernet ports in a FabricPath topology

Traffic from edge ports is looked up in the Layer 2 forwarding table, and encapsulated into the MAC-in-MAC frame whose destination switch-id is the switch that the destination host is attached to. This allows sending traffic along multiple paths with the same end-to-end total cost and provides the ability to construct practically any topology and be sure that traffic will take the most direct path (lowest cost) between any two devices.

Switch-ids

In order to understand these concepts, let's consider the topology in Figure 7:

Figure 7. Example switch-id allocation

The following CLI capture from an edge device shows the output of the show fabric switch-id command. The output shows all the switch-ids present in the topology, and it highlights with a * the switch-id of the device that the user is on:
DC1-5500-1# show fabricpath switch-id
FABRICPATH SWITCH-ID TABLE
Legend: `*' - this system
=========================================================================
SWITCH-ID SYSTEM-ID FLAGS STATE STATIC EMULATED
----------+----------------+------------+-----------+--------------------
*111 0005.73d4.9141 Primary Confirmed Yes No
112 0005.73fc.2001 Primary Confirmed Yes No
113 0005.73fc.207c Primary Confirmed Yes No
121 0022.5579.d3c1 Primary Confirmed Yes No
122 108c.cf18.0941 Primary Confirmed Yes No
The edge switch maintains a MAC address table with the mapping of MAC address to switch-ids as well as the local source MACs from its CE ports, as illustrated here:
DC1-5500-1# show mac address-table vlan 101
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 101 0000.0700.0000 dynamic 0 F F Eth1/5
* 101 0000.0700.0100 dynamic 10 F F 112
* 101 0022.5579.d3c1 dynamic 10 F F 121
* 101 108c.cf18.0941 dynamic 0 F F 122
As you can see from the output, the locally attached MAC addresses are associated with the local port. The remote MAC addresses are associated with the switch-id of the destination.
On the spine devices, there is no MAC address table learning, so the forwarding table is purely switch-id-based. In the following example, you can see each switch-id and which port can be used to reach the destination:
FabricPath Unicast Route Table for Topology-Default
0/121/0, number of next-hops: 0
via ----, [60/0], 3 day/s 08:11:51, local
1/111/0, number of next-hops: 1
via Eth8/3, [115/40], 0 day/s 06:15:55, isis_fabricpath-default
1/112/0, number of next-hops: 1
via Eth8/5, [115/40], 3 day/s 08:09:17, isis_fabricpath-default
[...]
Figure 8 illustrates how traffic is forwarded from MAC address A to MAC address B in a Cisco FabricPath topology:

Figure 8. FabricPath forwarding

Notice how the switch-id values are assigned by default to each FabricPath device via the DRAP protocol. However, it may be good practice to manually assign them in order to provide a more meaningful numbering scheme. In this latter scenario, a specific check is performed every time a new switch-id value is configured on a given FabricPath node, in order to prevent you from configuring a duplicate switch-id within the same domain. However, if two FabricPath switches from two different and isolated FabricPath domains converge into a single FabricPath domain, a mechanism intervenes to resolve conflicts:
DC1-5500-1# 2011 Aug 17 15:32:57 DC1-5500-1 %$ VDC-1 %$ fabricpath %FABRICPATH-2-FABRICPATH_LINK_BRINGUP_STALLED_STATIC: Link bringup stalled due to conflicts
DC1-5500-1# show fabricpath conflict all
====================================================
Fabricpath Port State Table
Port State
---------------+-------------------------+----------
port-channel2 Suspended due to conflicts
==============================================
Fabricpath Conflicts
SYSTEM-ID SWITCH-ID STATIC
---------------+--------------+---------------
0005.73d4.9141 3 Yes
108c.cf18.0941 3 Yes

Conversational Learning

Edge switches learn MAC addresses only if they have a node connected on the edge port or if the switch is a remote MAC and is in active conversation with a known MAC address connected to an edge port. In other words, a simple flood of traffic (either unknown unicast or broadcast) from a remote switch is not going to populate the MAC address table of an edge switch.

VLAN Trunking

In spanning tree networks, the user has to specify which VLANs belong to which port by using the switchport trunk allowed vlan command. With FabricPath, the user doesn't need to explicitly specify which VLANs are carried on a FabricPath-enabled link (VLANs obviously must be defined on the switches).
The configuration of a FabricPath core port is performed with the command switchport mode fabricpath. The command switchport mode trunk puts a port into Classic Ethernet mode instead of FabricPath mode.

Metrics

The preferred path to any switch-id is calculated based on the metric to any given destination. The metric is as follows:

• 1-Gbps Ethernet links have a cost of 400

• 10-Gigabit Ethernet links have a cost of 40

• 20-Gbps have a cost of 20

Figure 9 illustrates the use of the linkcost in a real topology:

Figure 9. Metrics calculation in FabricPath

The following code shows the cost of fabricpath links from the switch "112" from Figure 9.
DC1-5500-2# show fabricpath isis interface br
Fabricpath IS-IS domain: default
Interface Type Idx State Circuit MTU Metric Priority Adjs/AdjsUp
--------------------------------------------------------------------------------
Ethernet1/1 P2P 1 Up/Ready 0x01/L1 1500 40 64 1/1
Ethernet1/2 P2P 2 Up/Ready 0x01/L1 1500 40 64 1/1
Ethernet1/11 P2P 3 Up/Ready 0x01/L1 1500 400 64 1/1

Multipath Load Balancing

Figure 10 illustrates unicast load balancing in a FabricPath network:

Figure 10. Unicast load balancing in FabricPath

In this example, each edge device has two equal-cost-path routes to each other's spine:
DC1-5500-2: show fabricpath route:
1/113/0, number of next-hops: 2
via Eth1/1, [115/80], 3 day/s 19:36:09, isis_fabricpath-default
via Eth1/2, [115/80], 0 day/s 18:40:16, isis_fabricpath-default
The load balancing is based on fields such as the IP address, Layer 4 port, and so on, and it is configured with the following command:
DC1-5500-2(config)# fabricpath load-balance unicast < destination, include-vlan, layer-3, layer4, mixed, source, source-destination>

Multidestination Trees

In FabricPath multicast traffic, broadcast and flooded traffic is forwarded along a multidestination tree. FabricPath allows for multiple multidestination trees in order to achieve traffic load balancing for multidestination frames. This is achieved by using a special Tag present in FabricPath frames, which is called Forwarding Tag or FTag.
The FTag, together with the type of the frame's destination address (unknown unicast, broadcast, or multicast), selects a forwarding graph, which is the set of possible switches and interfaces along which the frame is forwarded. At each switch, the FTag carried in the frame header is used to forward the frame along the interfaces that are in the forwarding graph.
FabricPath uses a link state protocol to determine the forwarding trees in the network. FabricPath technology allows the definition of multiple topologies.
At the time of this writing, only one topology is supported, in which there are two multidestination trees. As Figure 11 illustrates, two FTags are used: FTag1 and FTag2. In this example, each one of these multidestination trees is rooted at one of the spines.

Figure 11. FTag trees assignment in FabricPath

FTag1 is used for the first multidestination tree and FTag2 is used for the second multidestination tree. FTag trees are used as follows:

• FTag1 is used for unknown unicast, broadcast, and multicast. The highest priority switch in the FabricPath topology is chosen as the root for FTag1. As discussed later in the guide, it is possible to modify the default FabricPath root priority value in order to place the root for FTag1 or FTag2 on a given device in a deterministic fashion. If the default priority is maintained on all the switches as part of the FabricPath domain, the switch system-id value is used as a tiebreak to determine the switch with the highest root priority.

• FTag2 is used only for multicast traffic. The second highest priority switch in the FabricPath topology is chosen as the root for FTag2.

Here are the factors, in order, that are used in order to elect the root for FTag1 and subsequently for FTag2:

• Highest root priority

• Highest system-id

• Highest switch-id

Figure 12 illustrates the forwarding of flooded frames along the FTag1 tree:

Figure 12. Forwarding of unknown unicast and broadcast frames

Multicast Forwarding

Multicast traffic forwarding uses FTag trees based both on load balancing of multicast frames and on tree pruning that is governed by Internet Group Management Protocol (IGMP) snooping (platform-specific load-balancing methods are discussed in the design sections of this guide). The edge device performs IGMP snooping and the member reports are conveyed in FabricPath via the Group-Membership Link State Protocol (GM-LSP). IGMP snooping and IS-IS work in conjunction to build per-VLAN multicast group-based trees in the FabricPath network.

vPC+

A virtual PortChannel (vPC) allows links that are physically connected to two different Cisco Nexus switches to appear as a single PortChannel to a third device. This provides a loop free topology eliminating the spanning-tree-blocked ports and maximizing the bandwidth usage. In a FabricPath network, a host or legacy (not FabricPath-enabled) Ethernet switch can be connected through a port channel to two FabricPath edge switches by using a configuration construct called emulated switch. The emulated switch implementations in FabricPath, where two FabricPath edge switches provide a vPC to a third-party device, is called vPC+.
Emulated switch is a construct in which two FabricPath switches emulate a single switch to the rest of the FabricPath network. The packets originated by the two emulated switches are sourced with the emulated switch-id. The other FabricPath switches are not aware of this and simply see the emulated switch (identified by a dedicated switch-id value) as reachable through both switches. This means that the two emulated switches have to be directly connected via peer link, and there should be a peer-keepalive path between the two switches to form the vPC+.
Other than the fact that a peer-link is used to synchronize MAC addresses between vPC+ peers, the peer-link is a regular FabricPath link, and as such it can be used by orphan ports for direct communication.
The following list provides the reference terminology for vPC+ components:

vPC+: The functionality enabled on two independent upstream devices to make them appear as a single logical device to the downstream switches, allowing the establishment of PortChannel connections. All the MAC addresses learned from devices locally attached to downstream devices are advertised to the FabricPath domain as "connected" to the defined emulated switch.

vPC+ peer device: One of a pair of devices that are connected with the special port channel known as the vPC+ peer link.

vPC+ peer link: The link used to synchronize states between the vPC+ peer devices. Both ends must be on 10 Gigabit Ethernet interfaces.

vPC+ domain: This domain is formed by the two vPC+ peer link devices. It is also a configuration mode for configuring some of the vPC+ peer link parameters.

vPC+ peer-keepalive link: The peer-keepalive link, a Layer 3 link between the vPC+ peer devices, is used to ensure that both devices are up. The peer-keepalive link sends periodic keepalive messages between the vPC peers on an out-of-band path.

vPC+ member port: Interface that belongs to the vPC+.

Design Considerations for the Spine/Aggregation Layer

The specific topology discussed and validated in the context of this design guide consists of a pair of spine devices connected to multiple edge devices. Cisco Nexus 7000 Series platforms are used as spine devices, whereas Cisco Nexus 5500 Switches are connected at the edge.
The spine in this design is also used to perform the routing function between the FabricPath cloud and the rest of the network. The default gateway for the servers is located in the spine layer, which, being also the router in the topology, performs the function of an edge in FabricPath. At the time of this writing, the Cisco Nexus 7000 Series hardware combination that provides this capability consists of F1 cards (spine function) together with the M1 cards (routing function).
In the topology shown in Figure 13, the spine layer, which is also an aggregation layer since it aggregates multiple edge switches, operates simultaneously in two modes:

• It forwards Layer 2 traffic between the edge devices exclusively based on the destination switch-id, without the need to learn MAC addresses (east-to-west traffic).

• It also learns MAC addresses exclusively for the purpose of encapsulating routed traffic into FabricPath frames (north-to-south traffic).

Figure 13. Layer 3 Aggregation/Spine

These two behaviors are completely independent, in that even if, as a theoretical example, the 128-K adjacency table of the M1 card fully filled up, the E-W Layer 2 traffic would still be forwarded based on the switch-id and not flooded.
Compared with a spanning-tree design or a vPC design, this design includes the following advantages:

• Ease of configuration

• Multipath forwarding for unicast and multicast Layer 2 and Layer 3 traffic

• Faster convergence times

• Loop mitigation thanks to the use of a Time to Live (TTL) field in FabricPath frames

The focus of this section is on the design of this hybrid Layer 2/Layer 3 spine/aggregation layer. This is not the only possible design option. For instance, you can also add Layer 3 to an edge switch and use the spine as a pure Layer 2 spine, or connect a pair of Layer 2 edge switches to dedicated routing devices, as highlighted in Figure 14. It is outside of the scope for this document to discuss these specific design options further.

Figure 14. Fabricpath design with separate L3 edge

Topology Considerations

FabricPath technology allows building multiple logical topologies on the same physical topology. At the time of this writing, this functionality has not yet been enabled by the software, so you should not attempt to create additional topologies besides the default topology 0.

Declaring VLANs as FabricPath VLANs

All VLANs that are meant to be forwarded according to Cisco FabricPath rules must be created as FabricPath VLANs.
A VLAN needs to be explicitly configured for Classic Ethernet (CE) mode or for FabricPath mode. By default, all VLANs are in CE mode. These CE VLANs are not brought up on FabricPath core links. They can be forwarded on the FabricPath edge ports locally or to other Classic Ethernet switches.
The command required to configure a VLAN in FabricPath mode is:
DC1-Agg1(config)# vlan 101
DC1-Agg1(config-vlan)# mode fabricpath
The FabricPath VLANs are forwarded on FabricPath core links using MAC-in-MAC header encapsulation and on FabricPath edge links without the MAC-in-MAC header. FabricPath VLANs can be grouped into topologies for traffic engineering. By default, all FabricPath VLANs belong to the base topology 0.
CE VLANs and FabricPath VLANs cannot share a link unless the link is a FabricPath edge port.

Declaring Ports as FabricPath Ports

All ports that are carrying FabricPath traffic are defined as type fabricpath. With FabricPath, you don't configure trunking and pruning of VLANs on FabricPath ports. The required configuration to declare a port as a FabricPath port is very simple:
DC1-Agg1(config)# interface ethernet 8/3
DC1-Agg1(config-if)# description FP Link to DC1-5500-1
DC1-Agg1(config-if)# switchport mode fabricpath
DC1-Agg1(config-if)# no shutdown

Topology 0

Topology 0 is the default topology and cannot be removed; it exists by default:
DC1-Agg1(config)# fabricpath topology ?
<1-63> Fabricpath Topology ID 1-63
DC1-Agg1# show fabricpath isis topology ?
<0-63> Specific topology information
summary Display summary topology information
The only configuration that you may want to modify in order to optimize traffic distribution is the root choice for the multidestination trees.

Unicast Layer 2 Multipathing

Unicast Layer 2 multipathing is on by default, but you can select the load balancing mechanism (by default it uses Layer 2, Layer 3, or Layer 4 source and destination IP and VLAN) for Layer 2 traffic with the following command:
fabricpath load-balance unicast < destination, include-vlan, layer-3, layer4, mixed, source, source-destination>

Building a Routed Spine

In this design recommendation, the spine also performs routing functions, and as a result the default gateway must be configured. When the spine is built with the Cisco Nexus 7000 Series product family, you need to understand the integration of F1 and M1 cards. This integration provides the combined benefits of a FabricPath switch for Layer 2 traffic and decapsulation and encapsulation of Layer 3 traffic into the FabricPath cloud, without a need for any external wiring or additional equipment.

F1 and M1 Card Cooperation

When an F1 card is used in a Cisco Nexus 7000 Series that provides the spine function (pure spine deployment), no remote MAC address learning occurs on the spine, as previously described. The F1 card doesn't do routing either.
M1 cards instead provide the ability to perform routing between VLANs and toward the routed domain of the data center network.
If an M1 card is present in the same virtual device context (VDC) as the F1 card, the F1 card will perform learning, but this doesn't mean that the F1 card forwards FabricPath traffic based on MAC addresses. The F1 card forwards the east-to-west traffic using the remote switch-id.
In this configuration, MAC addresses are learned on the F1 card exclusively with the purpose of being able to encapsulate north-to-south routed traffic into FabricPath.
In addition to this, the learning is optimized in the sense that each component of the F1 card learns only the locally connected MAC addresses.
This optimization requires the use of FabricPath on the links connecting to the edge. The reason is that the edge encapsulates the frames with the destination switch-id so even if learning is performed, it is only for the purpose of routed traffic, not for the east-to-west traffic.
In order to fully understand, this concept you need to consider the architecture of the F1 card, which is highly optimized for MAC scalability.

M1 Card Routing Traffic

Figure 15 illustrates the traffic flow for FabricPath traffic that is routed via the M1 card. Traffic routed from one FabricPath VLAN to another FabricPath VLAN enters via F1 cards, exits via F1 cards, and is routed by the M1 card.
If traffic needs to be routed from the FabricPath cloud to a Layer 3 infrastructure, it enters from an F1 card and it exits via an M1 card port.

Figure 15. Traffic forwarding with M1/F1 linecards in a Nexus 7000 chassis

F1 Card Architecture and MAC Optimization

East-to-west FabricPath traffic (or in other words pure Layer 2 traffic) doesn't engage the M1 card at all, nor does it require it. Pure Layer 2 traffic is handled by F1 cards and switched exclusively based on the switch-id information, as illustrated in Figure 16:

Figure 16. Forwarding with L2 traffic in a mixed M1/F1 chassis

Because the M1 card is present and routed traffic requires being encapsulated into a FabricPath header, the F1 card also stores MAC address mappings to destination switch-ids; hence some learning occurs, which is not used for Layer 2 switched traffic. In order to understand the scalability of this solution, you have to consider the architecture of the F1 card, which is based on the Switch-on-a-Chip concept (SoC).
Each SoC performs switching as it if was an independent switch, and forwarding tables are not synchronized. Therefore, by distributing edge switches on multiple SoCs, you can take advantage of the aggregated capacity of all forwarding tables on the SoCs (Figure 17).

Figure 17. High level view of ASICs arrangement in the F1 linecard

MAC Scalability Considerations

In an M1+F1 solution and assuming that the edge switches are connected via FabricPath to the F1 card, each SoC learns the MAC addresses that are specific to it.
Layer 2 traffic that doesn't leverage the default gateway doesn't require space in the Layer 2 table as it is switched exclusively based on the switch-id information. The MAC learning is required exclusively for routing purposes and Layer 2 switched traffic source MACs aren't learned on the F1 card unless a switch virtual interface (SVI) is present on the VLAN.
Since each SoC provides 16K MAC address capacity for the purpose of encapsulating routed traffic, the hardware could provide 256K MAC addresses capacity on the F1 card and 128K MAC addresses on the M1 card so the solution could accommodate up to 128K end devices routed by this combination of cards. A feature called mac-proxy (currently planned for CY12) could unleash this scalability. At the time of this writing, in certain failure scenarios, the traffic routed to the fabricpath cloud may use up to a maximum of 16k MAC addresses, after which traffic routed to any additional MAC would be flooded on one of the FTAG trees.
Notice that this behavior has no impact for the Layer 2 east-to-west traffic because this one is switch exclusively based on the remote switch- ids.
Figure 18 illustrates how the Nexus 7k builds the adjacency table for traffic routed to the Fabricpath cloud: 16k source MAC addresses sourced by Cisco Nexus 5500-1 are learned only by the SoC associated with port 8/3. The 8k source MAC addresses sourced by the Cisco Nexus 5500-2 are learned only by the SoC associated with port 8/5.

Figure 18. MAC address learning with M1/F1 linecards

The following CLI capture shows the SoC ASIC utilization with the traffic pattern described in Figure 18.
L2 Forwarding Resources
-----------------------
L2 entries: Module inst total used mcast ucast lines lines_full
8 1 16384 4 0 4 1024 0
8 2 16384 16002 0 16002 1024 642
8 3 16384 8001 0 8001 1024 0
8 4 16384 1 0 1 1024 0
8 5 16384 0 0 0 1024 0
8 6 16384 0 0 0 1024 0
8 7 16384 0 0 0 1024 0
8 8 16384 0 0 0 1024 0
8 9 16384 2001 0 2001 1024 0
So what is the scalability in terms of MAC addresses on a FabricPath design? For traffic that is purely Layer 2 - that is, traffic whose VLAN doesn't include an SVI - the scalability is exclusively dependent on the MAC table sizes of the edge devices in the topology; thus, the more edge devices, the more MAC entries you can switch. As an example of such VLANs, consider VLANs that are used to interconnect virtual machines (VMs) to create a virtual data center in a multitier environment. These VLANs often do not require any routing.
Figure 19 illustrates a typical virtual data center (vDC): there can be several isolated networks and "fenced" networks (that is, connected via a gateway), so not all networks require a default gateway on the spine.

Figure 19. Example of a Vitual datacenter network

How many edge switches you can have in the same Layer 2 domain is bound to the maximum number of switch-ids that have been validated by testing. These numbers are published in the release notes or in the configuration limits in the documentation on the Cisco Nexus 5000 Series and Cisco Nexus 7000 Series.

Note: At the time of this writing, this number varies from 64 to 128 switch-ids, depending on the software release and the platform.

For traffic whose VLANs are routed, the scalability is 16k-128k MAC addresses for routed traffic, which is provided by the M1/F1 combination and is dependent on the traffic patterns.

Note: As previously mentioned at the time of this writing and with this specific design, should the links connecting edge to spine break, forcing traffic through the asymmetric path connecting the spines, the number of routed MAC addresses is reduced to 16k.

A functionality named mac-proxy, is on the roadmap for a future NX-OS SW release that will eliminating this caveat. Once this functionality is available (currently planned for 1HCY12), it would truly be possible to consider the MAC address scalability off a FabricPath domain as only dictated by the M1 MAC table space (minimum 128k) even in presence of multiple link failures.
Notice that even if the total MAC count is exceeded, only routed traffic will experience flooding, while Layer 2 traffic will be switched based on the switch-id information. For this reason, it won't be experiencing flooding.
For more information about scalability, see the "FabricPath Scalability Considerations" section at the end of this guide.

Gateway Routing

The gateway protocols work in FabricPath the same way they work in regular Ethernet networks. The default Active/Standby behavior of Hot Standby Router Protocol (HSRP) doesn't negate the value of FabricPath. Even without any further configuration than the standard configuration of HSRP, the Layer 2 traffic (east-to-west) would benefit from multipath forwarding.
Routed traffic (south-to-north) would instead be forwarded only to the active HSRP device, as shown in Figure 20:

Figure 20. The default HSRP configuration doesn't make use of both spines

Active/Active Gateway Routing

To perform traffic forwarding from multiple spines, you can use the Gateway Load Balancing Protocol (GLBP), which hands out a different gateway MAC (up to four different MACs) in a round-robin fashion. A better idea is to use the concept of vPC+ at the spine.
The spine devices are connected by an additional FabricPath link (which would be recommended anyway to optimize multidestination tree forwarding) and by defining it as a vPC+ peer-link.
The vPC+ peer-link must be built using F1 ports and not M1 ports, since it must be configured as a FabricPath link (Figure 21).

Figure 21. vPC+ makes active and standby HSRP devices active

Figure 21 illustrates how traffic originating from the server connected to the edge and destined to its default-gateway can use two equal cost paths. Notice that the Active and Standby HSRP roles refer only to the control plane, as both aggregation devices are able to actively route traffic on the data plane.
The configuration to achieve this behavior is as simple as follows:
vpc domain 1
role priority 110
peer-keepalive destination 10.60.17.141
fabricpath switch-id 1000
interface port-channel10
switchport mode fabricpath
vpc peer-link
Even if the configuration requires the definition of a vPC domain, there is no real vPC port in the design described in this topology. Notice that you need to configure a switch-id under the vPC domain configuration. This identifies the emulated switch.
Because of this switch-id configuration, HSRP, VRRP, and GLBP announce their vMAC as coming from the emulated switch-id instead of the switch-id of each individual spine. By doing this, the edge switches can forward HSRP frames to either spine.
On both spines, the HSRP MAC address is programmed in the MAC tables with the special flag (G-flag) that indicates that this traffic is meant to be routed.
DC1-Agg1# show mac address-table
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen, + - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
G 101 0000.0c07.ac01 static - F F sup-eth1(R)

Benefits of vPC+ at the Spine

In this design guide, the use of vPC+ at the spine is strongly recommended. vPC+ gives you the ability to forward routed traffic to multiple routing engines as well as to optimize failover times.
As a result of this configuration, the edge switches learn the association of HSRP Virtual MAC addresses with the emulated switch-id instead of the individual spine switch-id:
DC1-5500-1# show mac address-table | include 0c07
* 101 0000.0a00.0c07 dynamic 10 F F Eth1/5
* 101 0000.0c07.ac01 dynamic 0 F F 1000
As Figure 22 illustrates, the use of vPC+ allows the routed traffic between any two given endpoints to benefit from both Layer 2 equal-cost multipathing and from the aggregated routing capacity of the spines.

Figure 22. Traffic load balancing for both Layer 2 and Layer 3 traffic

The existence of a dual-active path from each edge device to the emulated switch can be verified by looking at the FabricPath routing table, as highlighted here:
DC1-5500-1# sh fabricpath route
FabricPath Unicast Route Table
`a/b/c' denotes ftag/switch-id/subswitch-id
`[x/y]' denotes [admin distance/metric]
ftag 0 is local ftag
subswitch-id 0 is default subswitch-id
FabricPath Unicast Route Table for Topology-Default
1/1000/0, number of next-hops: 2
via Eth1/1, [115/40], 0 day/s 03:39:23, isis_fabricpath-default
via Eth1/3, [115/40], 0 day/s 03:39:23, isis_fabricpath-default

"Dual-Active Exclude" (Peer-Link "Failure")

As a result of declaring the link that connects the spines as a vPC peer-link, the default behavior of vPC applies whereby if the peer-link goes down, the SVIs on the vPC secondary device are shut down.
In the context of FabricPath designs, this behavior is not beneficial, because the FabricPath links are still available, and there's no good reason to shut down the SVIs on the secondary.
It is thus recommended to complete the vpc domain configuration with the following command:
vpc domain 1
dual-active exclude vlan <SVIs associated with Fabricpath VLANs>

Connecting FabricPath Edge or Leaf Layer to the Spine Layer

When connecting the FabricPath edge layer to the spine layer, you can use any ports on the spine layer F1 cards unless the scalability of MAC addresses in the topology is a key design factor. In this case, maximizing the use of the SoC in the F1 card is beneficial.
In order to do this, you just have to distribute edge switches in a way that they use different SoCs. As an example, you could connect edge-1 to ports 1 and 2 of the F1 card, edge-2 to ports 3 and 4, and so on (Figure 23).

Figure 23. Best way to connect edge to spine F1 card

Avoiding Flooding by Tuning ARP and Layer 2 Table

The default aging time on the Layer 2 forwarding table on the Cisco Nexus 7000 Series is 1800 seconds. The default Address Resolution Protocol (ARP) timeout is 1500 seconds. In order to avoid flooding routed traffic, it is beneficial to make sure that on the spine, the ARP timeout is more aggressive than the Layer 2 forwarding table timeout, which is normally the default.
Tuning the MAC address table on the edge device is not necessary because in this architecture, the edge doesn't perform learning based on the ARP. The edge device in a FabricPath topology learns MAC addresses only for active conversations.
In the spine, the presence of the M1 card in the same VDC as the F1 card requires the F1 card to learn MAC addresses for the sole purpose of rewriting the routed traffic with the FabricPath information. In this specific case, the SoCs learn MAC addresses based on ARP, but they don't use this MAC information for Layer 2 traffic forwarding.

Multicast Routing Considerations

Multicast routing and FabricPath switching do not require particular configuration or tuning on the aggregation layer devices. Only PIM may require tuning if you want to reduce failover times, just like you would do in regular routed designs.

Multicast Forwarding in FabricPath

As previously mentioned, FabricPath builds two multidestination trees with two different roots: one for FTag1 and one for FTag2. At Layer 2, multicast traffic is hashed to either tree in order to utilize both of them. The hashing to either multidestination tree is platform-dependent; for instance, it can include the VLAN field or the IP address fields.
In order to maximize the efficiency of the traffic distribution, it is advisable to manually set the priority for the root for the FTag1 and FTag2 at the spine:
Agg1:
fabricpath domain default
root-priority 66
Agg2:
fabricpath domain default
root-priority 65
Traffic load-balancing can be configured with the following command:
DC1-Agg1(config)# fabricpath load-balance multicast ?
destination Include destination parameters
[...[
source-destination Include source and destination parameters
symmetric Symmetric (default)
xor Include ex-or of source and destination parameters
IGMP snooping is used to build the Layer 2 multicast MAC forwarding table at the edge. The information is used to prune traffic on a per-VLAN basis on the two multidestination trees.
You can verify the traffic path for a given multicast group using the following command:
DC1-Agg1# show fabricpath mroute
(ftag/1, vlan/101, 0.0.0.0, 239.1.1.5), uptime: 00:30:24, isis
Outgoing interface list: (count: 2)
Interface port-channel10, uptime: 02:15:06, isis
Interface port-channel10, uptime: 00:45:54, isis
(ftag/1, vlan/101, 0.0.0.0, 239.1.1.6), uptime: 00:25:27, isis
Outgoing interface list: (count: 1)
Interface port-channel10, uptime: 00:45:54, isis

Routed Multicast Configuration

It's outside of the scope of this document to describe all the details of the routed multicast configuration, since it follows the same rules as in a non-FabricPath design:

• Enable PIM under the SVI interface.

• IGMP is automatically enabled under the SVI when PIM is enabled.

• Define the rendezvous points (RP) in the network and/or configure the spine device for auto-RP. NX-OS provides a specific functionality, based on RFC 4610, called Anycast-RP Using Protocol Independent Multicast (PIM). This functionality makes it possible to provide RP redundancy in a simpler fashion than when deploying Multicast Source Discovery Protocol (MSDP).

The following is a sample configuration:
ip pim anycast-rp 1.1.1.100 1.1.1.1
ip pim anycast-rp 1.1.1.100 1.1.1.2
ip pim rp-address 1.1.1.100 group-list 224.0.0.0/4
interface loopback100
description anycast-RP
ip address 1.1.1.100/32
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
From an operational perspective, it may be advisable to align the PIM designated router (DR) priority with the HSRP primary.

Note: The FTag selection for routed multicast traffic may change after routing. Post-routed traffic may be forwarded on a different FTag tree than the prerouted traffic.

PIM times are by default very conservative, and they are the cause of the slower convergence times that are measured for multicast traffic. You can bring them down significantly by either tuning the PIM timers or by using Bidirectional Forwarding Detection (BFD) for PIM.
Here is an example of the configuration:
bfd interval 50 min_rx 100 multiplier 3
interface Vlan101
[...]
ip pim sparse-mode
ip pim bfd-instance
ip pim dr-priority 10
ip pim hello-interval 1000

Note: PIM interval tuning should be considered with care because when deployed on too many interfaces, it may increase the CPU utilization. BFD scales better. For information on scalability, please refer to the release notes and/or the configuration limits.

Routed Multicast with vPC+

When using vPC+ in conjunction with the FTag priority settings that were previously described, the hashing of routed multicast traffic is optimized, as illustrated in Figure 24.

Figure 24. Multicast routing with FabricPath

This means there is a sort of "affinity behavior," based on which each vPC+ peer device will always use the multicast destination tree rooted on itself to forward multicast traffic.

Summary Recommendations

This section summarizes the recommendation for building a Layer 3 spine in a FabricPath topology. Figure 25 illustrates these recommendations.

• Configure VLANs in FabricPath mode.

• Consider using manually configured switch-ids. This can help with managing and operating the topology since you can define the numbering scheme.

• Configure SVIs and HSRP as usual.

• Configure vPC+ by defining a vPC peer-link and an emulated switch-id in the vPC domain. This is to allow integration of legacy (non-FabricPath-enabled) edge layer devices and to provide Active/Active default-gateway data-plane functionality.

• Configure Layer 3 peering between routers over a FabricPath VLAN (for instance, on the same link that is used to interconnect the spine devices, such as the vPC+ peer-link). This is required to reroute traffic destined to the northbound Layer 3 domain if one of the aggregation devices loses physical connectivity to the upstream devices.

• Use FabricPath-enabled links for vPC peer-link purposes.

• Configure dual-active exclude vlan for the list of SVIs (unless using vPC attached devices).

• Distribute edge switches onto different SoCs on the F1 card for maximum MAC address scalability.

• Even if this is the default configuration, at the spine, you may want to verify that the ARP aging timeout is faster than the Layer 2 forwarding table timeout.

Figure 25. Summary recommendations for FabricPath designs

Design Considerations for the Edge and Access Layer

The edge layer in FabricPath provides Classic Ethernet connectivity to servers and interfaces with FabricPath core ports to the spine layer.
The edge layer performs unicast Layer 2 multipathing load balancing on equal-cost-path Layer 2 routes.
Flooded and unknown unicast traffic is sent on the multidestination tree associated with FTag1. Multicast traffic is load balanced on the two multidestination trees (identified as usual with FTag1 and FTag2).

FabricPath VLANs

As with the spine layer, the access port in a FabricPath topology can be configured by assigning a FabricPath VLAN to an access port.
You should start the configuration by making a VLAN a FabricPath type of VLAN:
DC1-5500-1# show vlan id 101
VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
101 FP_VLAN_101 active Eth1/5
VLAN Type Vlan-mode
---- ----- ----------
101 enet FABRICPATH
The ports connecting the edge device to the spine or even to another edge device should be configured as FabricPath core ports:
(config-if)# switchport mode fabricpath
The ports connecting to servers are configured as switchport mode access or trunk access. They can forward the traffic with the regular Ethernet encapsulation.
To distribute unicast traffic via equal-cost multipath (ECMP), use the following command:
DC1-5500-2(config)# fabricpath load-balance unicast < destination, include-vlan, layer-3, layer4, mixed, source, source-destination>

Note: By default, the unicast hashing is configured to include the source and destination IP addresses.

MAC Learning at the Access Layer

The edge layer by default performs conversational learning. Conversational learning spares MAC address table consumption at the edge because remote MAC addresses are not learned in the Layer 2 forwarding table unless there's an active conversation between a local node and a remote node.
If you configure an SVI on a FabricPath VLAN, conversational learning is disabled on this VLAN. It is therefore normally recommended to not configure management SVIs on FabricPath VLANs.

Learning the HSRP MAC Address

The HSRP and GLBP MAC addresses - and in general the gateway vMAC addresses - are of particular interest because there's never an active conversation between an end host and a vMAC. The reason is that when the server sends traffic to the gateway, it does use the vMAC as the destination MAC address, but when the router sends traffic to the end host, it never uses the vMAC as source MAC address; instead, it uses the burned-in address (BIA).
If the edge switch was not learning the gateway vMAC address, all traffic destined to be routed would be continuously flooded. The mechanism that solves this problem consists in considering HSRP, GLBP, or VRRP hellos as part of the conversation with an end host. When a host sends traffic destined to the gateway vMAC address, the edge switch stores it temporarily in the Layer 2 forwarding table. As soon as an HSRP hello is received, it programs the switch-id associated with it.
If the spine is configured for vPC+, the source switch-id is the emulated switch-id. As a result, the MAC address table of the edge switch will have the HSRP MAC associated with the emulated switch-id.
DC1-5500-2# show mac address-table vlan 101
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+----------
* 101 0000.0c07.ac01 dynamic 0 F F 1000

Direct Connectivity between Edge Switches

One interesting point to observe is that edge switches can be connected directly to each other. This can be used to provide a direct path between hosts. Consider Figure 26:

Figure 26. Direct connectivity between edge switches allows direct communication between servers

When DC1-5500-2 to DC1-5500-3 is connected, the two hosts can talk directly over this link without having to go via the spine switches. This is unique to FabricPath. Spanning Tree Protocol would have blocked this link.

vPC+ at the Edge

The edge switches can be configured to perform vPC-like functions, which in the particular case of a FabricPath infrastructure are referred to as vPC+ (as already discussed for providing Active/Active default-gateway functionalities on the spine/aggregation devices).
Functions that are enabled by vPC+ at the edge include:

• The ability to attach servers to edge switches with port-channel teaming

• The ability to attach additional Classic Ethernet switches in vPC mode

• The ability to attach Cisco Fabric Extenders in FEX Active/Active mode.

The key requirement for a vPC+ configuration to be operational in FabricPath is to define the emulated switch-id in the vPC domain configuration. The other requirement is that the vPC peer-link must be a FabricPath link. All VLANs on vPC+ links must be FabricPath VLANs.
When the peer-link goes down vPC+ operates similarly to vPC

• When all the links of the peer-link fail, the primary switch will take no action, and the secondary switch will bring down all its vPC links.

• In addition, vPC+ also intervenes to prevent attracting traffic to the vPC secondary switch. This function is provided by IS-IS by ensuring that the secondary switch stops advertising reach ability information to the emulated switch. This is essential to prevent the secondary switch from attracting traffic destined for the emulated switch from the FabricPath cloud.

Figure 27 illustrates a design with vPC+:

Figure 27. Using vPC+ on the edge switches

In such a topology, the MAC addresses associated with a pair of edge devices running vPC+ are advertised to the other switches that are part of the same FabricPath domain as belonging to the emulated switch-id, instead of the individual switch-ids of the edge Cisco Nexus 5500 Switches.
Overall, with vPC+ most of the communications between devices that connect to the vPC+ peers is kept local and doesn't involve the spines. This is true for both vPC connected devices and for orphan ports as well. Orphan ports in vPC+ can communicate over the peer-link as if this link were a regular FabricPath link (Figure 28):

Figure 28. Communication between devices connected to a vPC+ edge

In addition, even if the vPC+ peer-link were to fail, Host1 and Host4 would not lose connectivity because they would still be able to communicate via the spines. In Figure 28, Host1 and Host4 send traffic to the switch-id of the respective edge switches and not to the emulated switch-id. As for the north-to-south traffic flows destined to the hosts supported by vPC links, they are sent to the vPC primary only as the vPC secondary device stops advertising edge MAC address information associated to the emulated switch-id (Figure 29).

Figure 29. vPC+ behavior upon loss of the peer-link

Integration with FEX

You can extend FabricPath topologies by adding Cisco Fabric Extenders. Fabric extender ports are edge ports, and if you are using vPC+, they allow creating vPCs between servers and FEX. Dual-layer vPC is also implemented.

Summary

In summary, at the edge layer there is no major tuning necessary. The configuration is very simple:

• You should configure VLANs as FabricPath VLANs.

• Consider using manually configured switch-ids. This can help with managing and operating the topology since you can define the numbering scheme.

• To avoid disabling conversational learning, make sure not to have an SVI (like the management SVI) on a FabricPath VLAN.

• Consider the use of vPC+ if using port-channel teaming or FEX Active/Active or adapter-FEX/VM-FEX.

FabricPath Scalability Considerations

In terms of FabricPath scalability, there are three main elements you should consider in a given FabricPath domain:

• Number of switches

• Number of VLANs

• Number of MAC addresses

The maximum number of switches and VLANs supported (that is, tested by QA) in the same FabricPath domain is continuously increasing at every Cisco NX-OS Software release. At the time of writing of this document, these are the expected supported values:

• Cisco Nexus 7000 Series (NX-OS Release 6.0): 128 switch-ids and between 2000 and 4000 VLANs

• Cisco Nexus 5500 Switches (NX-OS Release 5.1(3)N1(1)): 128 switch-ids and between 2000 and 4000 VLANs

Note: For updated information on the maximum number of switch-ids supported within the same FabricPath domain, please refer to the release notes and the configuration limits documentation for each specific software release.

One of the benefits of FabricPath is the scalability both in terms of MAC addresses and in terms of throughput/oversubscription ratios.
In terms of MAC address scalability, we should differentiate between edge and spine devices.
The edge switches learn only MAC addresses for which there are active conversations established (conversational learning). The focus of this document is on using the Cisco Nexus 5500 Switches at the edge of the FabricPath network. The current maximum value of MAC addresses supported on these devices is 24k.
For the spine devices, it is important to distinguish between east-to-west (switched) and north-to-south (routed) traffic flows:

• East-west traffic: These flows are switched by the spine devices based on the outer MAC header information (FabricPath switch-id values) (see Figure 30). For the spine devices to perform this switching, there is technically no need for them to learn the MAC addresses of the endpoints connected to the FabricPath edge switches.

Figure 30. FabricPath spines switch traffic based on the Destination Switch-id (DSID)

• North-south traffic: Traffic coming from the Layer 3 network domain and directed to an endpoint connected to a FabricPath edge switch needs to be first handled by the M1 card (performing routing functionalities) and eventually sent out to the FabricPath core through an F1 interface.

Figure 31. Traffic routing by M1 card and subsequent FabricPath encapsulation by F1 card

This essentially means that the MAC addresses scalability is mainly dictated by the CAM table available on the M1 card, which is capable of containing 128k addresses. The M1 card has all the information needed (MAC address, switch-id of the edge device) to internally forward the traffic to the F1 interface that is properly connected to the destination FabricPath edge switch.
However, as previously discussed in the section "MAC Scalability Considerations," the endpoint MAC addresses are also learned on the F1 card instead. This is because in the current implementation, the F1 card needs to associate to each remote MAC address a specific value (named local ID [LID]) and insert it into the FabricPath header before sending the frame toward the edge switch (Figure 32).

Figure 32. F1 learning MAC addresses for routed traffic purposes

The end result is that each SoC is currently learning all the remote MAC addresses for endpoints locally attached to each FabricPath edge switch. This happens in an independent fashion, whether or not traffic belongs to the same VLAN, which essentially brings the maximum MAC scalability values to 256k per F1 card (since it supports 16 SoCs, each capable of holding up to 16k MAC addresses). By properly assigning edge switches to spine ports, it is theoretically possible to use the aggregated capacity of the spine F1 cards switch-on-chips (SoC) component for up to 128k MAC addresses for routed traffic.
However, at the time of this writing, the maximum number of MAC addresses that can be stored for routing purposes without incurring into flooding is 16k. The reason is described by the following examples. Let's assume that one of the uplinks connecting a FabricPath edge switches to the spine fails, as shown in Figure 33:

Figure 33. Failure scenarios for routed traffic

Traffic that originated from the upstream Layer 3 core devices can still be sent to the left aggregation switch, but it needs to use the transit connection (vPC+ peer-link) between the spines to be directed toward the FabricPath edge switch. As a consequence, the MAC addresses of the endpoints connected to that specific FabricPath edge switch will need to be installed on the MAC tables of the SoCs associated to these peer-link interfaces. This would basically limit to 16k the total amount of MAC addresses that can be learned on the FabricPath edge switch, in order to avoid flooding the routed traffic whose MACs have not been learned.
Obviously, the situation would become worse if multiple uplinks belonging to separate FabricPath switches fail at the same time, leading to an extremely unlikely case where the maximum MAC scalability supported across the entire FabricPath domain is reduced to 16k MAC addresses (Figure 34).

Figure 34. Impact of multiple links failure for routed traffic

If this unlikely scenario is a concern, it is possible to deploy port-channels (instead than single links) to interconnect each FabricPath edge devices to the spine switches. In that way, a single link failure event would not force the traffic on the peer-link.
It is also worth noting that a specific functionality, named mac-proxy, is on the roadmap for a future NX-OS SW release that will help eliminating the issue described above, by removing the need for the F1 SoCs to learn endpoint MAC address information. Once this functionality is available (currently planned for 1HCY12), it would truly be possible to consider the MAC address scalability off a FabricPath domain as only dictated by the M1 MAC table space (minimum 128k).
For Layer 2 traffic (that is, traffic that is not directed to the default gateway), the MAC address scalability is dependent only on the MAC scalability of the edge devices. This is even more relevant for VLANs that do not require routing, such as the "isolated" networks constituted by the virtual server building blocks of typical virtual data centers (vDCs), like the ones depicted in Figure 35:

Figure 35. vDCs use many "isolated" VLANs

In such cases, the MAC address scalability for isolated VLANs is entirely decoupled from the F1 learning component.
In all cases, east-to-west Layer 2 traffic doesn't use the MAC address table of the spine devices.
Finally, the ability to do unicast multipathing on up to 16 different equal-cost paths makes it possible to place up to 16 spines in the core, thus lowering the oversubscription and providing an aggregated bandwidth that equals the sum of the bandwidth provided at the spine layer by each spine.

FabricPath Convergence Times

Cisco FabricPath improves significantly on the convergence times for any given failure. For most unicast failures, the traffic loss is within a few hundred milliseconds. For routed traffic, the worst convergence time is the time that it takes for routers to do an SPF calculation. Considering the various throttling mechanisms, this amounts to ~5 seconds, which can be further lowered by changing the default OSPF timers on the adjacent routing devices. The configuration required to achieve this is very simple:
router ospf 10
timers throttle spf 10 100 5000
timers throttle lsa all 10 100 5000
timers lsa arrival 80
For Layer 2, multicast convergence times are also within a few hundred milliseconds, while for routed multicast, PIM must be tuned via BFD and/or more aggressive timers than the defaults. Routed multicast with proper tuning can converge within a couple of seconds.
Notice that these slower convergence times are not due to FabricPath but to default timers in routing protocols. FabricPath has failure convergence times of a few hundred milliseconds.
For better failover times, consider the following:

• Use vPC+ at the spine in order for HSRP to be advertised with the emulated switch-id.

• Consider tuning the default OSPF SPF timers to avoid throttling SPF recalculations too much.

• Consider tuning the default PIM timers and or using BFD.

Sample Configurations

This section illustrates how to configure the Cisco Nexus 7000 Series Switches and Cisco Nexus 5500 Switches for a spine-edge design with the spine also providing the Layer 3 gateway function. The configuration samples refer to the network topology shown Figure 36.

Figure 36. Reference Topology for the configurations provided in this document

Note that in regard to topology shown in Figure 36 and following configuration samples, the F1 line card is inserted in slot 8, and the M1 line card is inserted in slot 10.

Cisco Nexus 7000 Series Spine 1 (DC1-Agg1)

install feature-set fabricpath
hostname DC1-Agg1
vdc DC1-Agg1 id 1
allow feature-set fabricpath
[...]
feature-set fabricpath
[...]
feature ospf
feature pim
feature interface-vlan
feature hsrp
feature lacp
feature glbp
feature vpc
feature pong
feature bfd
bfd interval 50 min_rx 100 multiplier 3
vlan 101
mode fabricpath
name FP_VLAN_101
vlan 102
mode fabricpath
name FP_VLAN_102
vlan 103
mode fabricpath
name FP_VLAN_103
vlan 333
mode fabricpath
name Aggr_IGP_Peering
vpc domain 1
role priority 110
peer-keepalive destination 10.60.17.141
delay restore 3600
dual-active exclude interface-vlan 101-104
fabricpath switch-id 1000
interface Vlan101
no shutdown
ip address 10.20.0.2/16
ip ospf passive-interface
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
ip pim bfd-instance
ip pim dr-priority 10
ip pim hello-interval 1000
hsrp 1
preempt delay reload 300
priority 110
ip 10.20.0.1
interface Vlan102
no shutdown
ip address 10.30.0.2/16
ip ospf passive-interface
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
ip pim bfd-instance
ip pim hello-interval 1000
hsrp 1
preempt delay reload 300
priority 110
ip 10.30.0.1
interface Vlan103
no shutdown
ip address 10.40.0.2/16
ip ospf passive-interface
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
hsrp 1
preempt delay reload 300
priority 110
ip 10.40.0.1
interface Vlan333
no shutdown
description Aggr_IGP_Peering
no ip redirects
ip address 11.1.10.1/24
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
interface port-channel1
description FP Link to DC1-5500-1
switchport
switchport mode fabricpath
interface port-channel10
description vPC+ Peer-Link
switchport
switchport mode fabricpath
vpc peer-link
interface Ethernet8/1
description vPC+ Peer-Link Member 1
switchport mode fabricpath
channel-group 10 mode active
no shutdown
interface Ethernet8/2
description vPC+ Peer-Link Member 2
switchport mode fabricpath
channel-group 10 mode active
no shutdown
interface Ethernet8/3
description Port-channel to DC1-5500-1 Member 1
switchport mode fabricpath
channel-group 1 mode active
no shutdown
interface Ethernet8/4
description Port-channel to DC1-5500-1 Member 2
switchport mode fabricpath
channel-group 1 mode active
no shutdown
interface Ethernet8/5
description FP Link to DC1-5500-2
switchport mode fabricpath
no shutdown
interface Ethernet8/7
description FP Link to DC1-5500-3
switchport mode fabricpath
no shutdown
interface Ethernet10/5
description L3 Link to core
ip address 12.1.5.1/24
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
no shutdown
interface mgmt0
ip address 10.60.17.140/24
interface loopback0
description local-loopback
ip address 1.1.1.1/32
ip router ospf 10 area 0.0.0.0
interface loopback100
description anycast-RP
ip address 1.1.1.100/32
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
router ospf 10
router-id 1.1.1.1
timers throttle spf 10 100 5000
timers throttle lsa all 10 100 5000
timers lsa arrival 80
fabricpath domain default
root-priority 66
topology 1
ip pim rp-address 3.3.3.3 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
fabricpath switch-id 121

Cisco Nexus 7000 Series Spine 2 (DC1-Agg2)

install feature-set fabricpath
hostname DC1-Agg2
vdc DC1-Agg2 id 1
allow feature-set fabricpath
[...]
feature-set fabricpath
feature vrrp
cfs eth distribute
feature ospf
feature pim
feature msdp
feature interface-vlan
feature hsrp
feature lacp
feature glbp
feature vpc
feature pong
feature bfd
bfd interval 50 min_rx 100 multiplier 3
vrf context management
ip route 0.0.0.0/0 10.60.17.254
vlan 101
mode fabricpath
name FP_VLAN_101
vlan 102
mode fabricpath
name FP_VLAN_102
vlan 103
mode fabricpath
name FP_VLAN_103
vlan 333
mode fabricpath
name Aggr_IGP_Peering
vpc domain 1
peer-keepalive destination 10.60.17.140
delay restore 3600
dual-active exclude interface-vlan 101-104
fabricpath switch-id 1000
interface Vlan101
no shutdown
no ip redirects
ip address 10.20.0.3/16
ip ospf passive-interface
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
ip pim bfd-instance
ip pim hello-interval 1000
hsrp 1
ip 10.20.0.1
interface Vlan102
no shutdown
no ip redirects
ip address 10.30.0.3/16
ip ospf passive-interface
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
ip pim bfd-instance
ip pim hello-interval 1000
hsrp 1
ip 10.30.0.1
interface Vlan103
no shutdown
no ip redirects
ip address 10.40.0.3/16
ip ospf passive-interface
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
hsrp 1
ip 10.40.0.1
interface Vlan333
no shutdown
description Aggr_IGP_Peering
no ip redirects
ip address 11.1.10.2/24
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
interface port-channel2
description FP Link to DC1-5500-1
switchport
switchport mode fabricpath
interface port-channel10
description vPC+ Peer-Link
switchport
switchport mode fabricpath
vpc peer-link
interface Ethernet8/1
description vPC+ Peer-Link Member 1
switchport mode fabricpath
channel-group 10 mode active
no shutdown
interface Ethernet8/2
description vPC+ Peer-Link Member 2
switchport mode fabricpath
channel-group 10 mode active
no shutdown
interface Ethernet8/3
description FP Link to DC1-5500-1
switchport mode fabricpath
channel-group 2 mode active
no shutdown
interface Ethernet8/4
description FP Link to DC1-5500-1
switchport mode fabricpath
channel-group 2 mode active
no shutdown
interface Ethernet8/5
description FP Link to DC1-5500-2
switchport mode fabricpath
no shutdown
interface Ethernet8/7
description FP Link to DC1-5500-3
switchport mode fabricpath
no shutdown
interface Ethernet10/5
description L3 Link to core
ip address 12.1.6.1/24
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
no shutdown
interface mgmt0
ip address 10.60.17.141/24
interface loopback0
description local-loopback
ip address 1.1.1.2/32
ip router ospf 10 area 0.0.0.0
interface loopback100
description anycast-RP
ip address 1.1.1.100/32
ip router ospf 10 area 0.0.0.0
ip pim sparse-mode
router ospf 10
router-id 1.1.1.2
timers throttle spf 10 100 5000
timers throttle lsa all 10 100 5000
timers lsa arrival 80
fabricpath domain default
root-priority 65
topology 1
ip pim rp-address 3.3.3.3 group-list 224.0.0.0/4
fabricpath switch-id 122

Edge Cisco Nexus 5500 Switch (DC1-5500-1) without vPC+

install feature-set fabricpath
feature-set fabricpath
hostname DC1-5500-1
feature telnet
no feature ssh
cfs eth distribute
feature interface-vlan
feature lacp
feature lldp
vrf context management
ip route 0.0.0.0/0 10.60.17.254
vlan 1
vlan 101
mode fabricpath
name FP_VLAN_101
vlan 102
mode fabricpath
name FP_VLAN_102
vlan 103
mode fabricpath
name FP_VLAN_103
interface port-channel1
description FP Link to DC1-Agg1
switchport mode fabricpath
interface port-channel2
description FP Link to DC1-Agg2
switchport mode fabricpath
interface Ethernet1/1
description FP EC Member to DC1-Agg1
switchport mode fabricpath
channel-group 1 mode active
interface Ethernet1/2
description FP EC Member to DC1-Agg1
switchport mode fabricpath
channel-group 1 mode active
interface Ethernet1/3
description FP EC Member to DC1-Agg2
switchport mode fabricpath
channel-group 2 mode active
interface Ethernet1/4
description FP EC Member to DC1-Agg2
switchport mode fabricpath
channel-group 2 mode active
interface Ethernet1/23
description to_Ixia2_port_1
switchport access vlan 101
spanning-tree port type edge
interface mgmt0
ip address 10.60.17.106/24
fabricpath domain default
fabricpath switch-id 111

Edge Cisco Nexus 5500 Switch (DC1-5500-1) with vPC+ and FEX A/A

install feature-set fabricpath
feature-set fabricpath
hostname DC1-5500-1
feature telnet
no feature ssh
cfs eth distribute
feature interface-vlan
feature lacp
feature vpc
feature lldp
feature fex
fex 101
pinning max-links 1
description "FEX0101"
fex 102
pinning max-links 1
description "FEX0102"
slot 101
provision model N2K-C2232P
slot 102
provision model N2K-C2232P
vrf context management
ip route 0.0.0.0/0 10.60.17.254
vlan 101
mode fabricpath
name FP_VLAN_101
vlan 102
mode fabricpath
name FP_VLAN_102
vlan 103
mode fabricpath
name FP_VLAN_103
vpc domain 2
role priority 110
peer-keepalive destination 10.60.17.107
fabricpath switch-id 1122
interface port-channel1
description FP Link to DC1-Agg1
switchport mode fabricpath
speed 10000
interface port-channel2
description FP Link to DC1-Agg2
switchport mode fabricpath
speed 10000
interface port-channel3
description vPC+ Peer-Link
switchport mode fabricpath
vpc peer-link
interface port-channel101
description vPC to FEX101
switchport mode fex-fabric
fex associate 101
vpc 101
interface port-channel102
description vPC to FEX102
switchport mode fex-fabric
fex associate 102
vpc 102
interface Ethernet1/1
description FP EC Member to DC1-Agg1
switchport mode fabricpath
interface Ethernet1/2
description FP EC Member to DC1-Agg1
switchport mode fabricpath
interface Ethernet1/3
description FP Link to DC1-Agg2
switchport mode fabricpath
interface Ethernet1/4
description FP Link to DC1-Agg2
shutdown
switchport mode fabricpath
interface Ethernet1/5
switchport access vlan 101
speed 1000
interface Ethernet1/15
switchport mode fabricpath
switchport trunk allowed vlan 101-103
channel-group 3
interface Ethernet1/16
switchport mode fabricpath
switchport trunk allowed vlan 101-103
channel-group 3
interface Ethernet1/23
switchport access vlan 101
spanning-tree port type edge
interface Ethernet1/31
switchport mode fex-fabric
fex associate 101
channel-group 101
interface Ethernet1/32
switchport mode fex-fabric
fex associate 102
channel-group 102
interface mgmt0
ip address 10.60.17.106/24
interface Ethernet101/1/1
switchport access vlan 101
interface Ethernet102/1/1
switchport access vlan 102
fabricpath domain default
fabricpath switch-id 111

Edge Cisco Nexus 5500 Switch (DC1-5500-2) with vPC+ and FEX A/A

install feature-set fabricpath
feature-set fabricpath
hostname DC1-5500-2
feature telnet
no feature ssh
cfs eth distribute
feature interface-vlan
feature vpc
feature lldp
feature fex
fex 101
pinning max-links 1
description "FEX0101"
fex 102
pinning max-links 1
description "FEX0102"
vrf context management
ip route 0.0.0.0/0 10.60.17.254
vlan 101
mode fabricpath
name FP_VLAN_101
vlan 102
mode fabricpath
name FP_VLAN_102
vlan 103
mode fabricpath
name FP_VLAN_103
vpc domain 2
peer-keepalive destination 10.60.17.106
fabricpath switch-id 1122
interface port-channel3
switchport mode fabricpath
vpc peer-link
interface port-channel101
switchport mode fex-fabric
fex associate 101
vpc 101
interface port-channel102
switchport mode fex-fabric
fex associate 102
vpc 102
interface Ethernet1/1
description FP Link to DC1-Agg1
switchport mode fabricpath
channel-group 1 mode active
interface Ethernet1/2
description FP Link to DC1-Agg1
switchport mode fabricpath
channel-group 1 mode active
interface Ethernet1/3
description FP Link to DC1-Agg2
switchport mode fabricpath
channel-group 2 mode active
interface Ethernet1/4
description FP Link to DC1-Agg2
switchport mode fabricpath
channel-group 2 mode active
interface Ethernet1/8
switchport access vlan 101
interface Ethernet1/11
switchport mode fabricpath
speed 1000
interface Ethernet1/15
description vPC+ Peer-Link Member
switchport mode fabricpath
channel-group 3
interface Ethernet1/16
description vPC+ Peer-Link Member
switchport mode fabricpath
channel-group 3
interface Ethernet1/23
switchport access vlan 101
spanning-tree port type edge
interface Ethernet1/24
switchport access vlan 101
spanning-tree port type edge
interface Ethernet1/31
switchport mode fex-fabric
fex associate 102
channel-group 102
interface Ethernet1/32
switchport mode fex-fabric
fex associate 101
channel-group 101
interface mgmt0
ip address 10.60.17.107/24
interface Ethernet101/1/1
switchport access vlan 101
interface Ethernet102/1/1
switchport access vlan 102
fabricpath domain default
ip route 0.0.0.0/0 10.1.101.254
fabricpath switch-id 112