This document presents a list of recommendations that help to implement
a safe network with regard to bridging for Cisco Catalyst switches that run
Catalyst OS (CatOS) and Cisco IOS? Software. This
document discusses some of the common reasons that Spanning Tree Protocol (STP)
can fail and the information for which to look to identify the source of the
problem. The document also shows the kind of design that minimizes spanning
tree-related issues and is easy to troubleshoot.
There are no specific requirements for this document.
This document is not restricted to specific software and hardware
For more information on document conventions, refer to the
Technical Tips Conventions.
This document does not discuss the basic operation of STP. To learn how
STP works, refer to this document:
This document does not discuss Rapid STP (RSTP), defined in IEEE
802.1w. Also, this document does not discuss Multiple Spanning Tree (MST)
protocol, defined in IEEE 802.1s. For more information on RSTP and MST, refer
to these documents:
For a more specific STP troubleshooting document for Catalyst switches
that run Cisco IOS Software, refer to the document
STP on Catalyst Switch Running Cisco Integrated IOS (Native
The primary function of the spanning-tree algorithm (STA) is to cut
loops that redundant links create in bridge networks. The STP operates at Layer
2 of the Open System Interconnection (OSI) model. By means of bridge protocol
data units (BPDUs) that exchange between bridges, the STP elects the ports that
eventually forward or block traffic. This protocol can fail in some specific
cases, and troubleshooting the resulting situation can be very difficult, which
depends on the design of the network. In this particular area, you perform the
most important part of the troubleshooting before the problem occurs.
A failure in the STA generally leads to a bridging loop. Most customers
Technical Support for spanning tree problems suspect a bug, but a bug is
seldom the cause. Even if the software is the problem, a bridging loop in an
STP environment still comes from a port that should block, but instead forwards
Refer to the
Spanning Tree Flash animation
to see an example that
explains how the Spanning Tree initially converges. The example also explains
why a blocked port goes into the forwarding mode because of an excessive loss
of BPDUs, resulting in STA failure.
The rest of this document lists the different situations that can cause
the STA to fail. Most of these failures relate to a massive loss of BPDUs. The
loss causes blocked ports to transition to forwarding mode.
Duplex mismatch on a point-to-point link is a very common configuration
error. If you manually set the duplex mode to Full on one side of the link and
leave the other side in autonegotiation mode, the link ends up in half-duplex.
(A port with duplex mode set to Full no longer
The worst-case scenario is when a bridge that sends BPDUs has the
duplex mode set to half-duplex on a port, but the peer port on other end of
link has the duplex mode set to full-duplex. In the above example, the duplex
mismatch on the link between bridge A and B can easily lead to a bridging loop.
Because bridge B has configuration for full-duplex, it does not perform carrier
sense before link access. Bridge B starts to send frames even if bridge A is
already using the link. This situation is a problem for A; bridge A detects a
collision and runs the backoff algorithm before the bridge attempts another
transmission of the frame. If there is enough traffic from B to A, every packet
that A sends, which includes the BPDUs, undergoes deferment or collision and
eventually gets dropped. From an STP point of view, because bridge B does not
receive BPDUs from A any more, bridge B has lost the root bridge. This leads B
to unblock the port connected to bridge C, which creates the loop.
Whenever there is a duplex mismatch, these error messages are on the
switch consoles of Catalyst switches that run CatOS and Cisco IOS
CDP-4-DUPLEXMISMATCH: Full/half duplex mismatch detected on port [mod]/[port]
%CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on FastEthernet5/1 (not half
duplex), with TBA05071417(Cat6K-B) 4/1 (half duplex).
Check the duplex settings and, if the duplex configuration does not
match, set the configuration appropriately.
For more information on how to troubleshoot a duplex mismatch, refer to
and Troubleshooting Ethernet 10/100/1000Mb Half/Full Duplex
Unidirectional links are a common cause of a bridging loop. On fiber
links, a failure that goes without detection often causes unidirectional links.
Another cause is a problem with a transceiver. Anything that can lead a link to
stay up and provide a one-way communication is very dangerous with regard to
STP. This example clarifies:
Here, suppose the link between A and B is unidirectional. The link
drops traffic from A to B while the link transmits traffic from B to A. Assume
that bridge B was blocking before the link became unidirectional. However, a
port can only block if it receives BPDUs from a bridge that has a higher
priority. Since, in this case, all the BPDUs that come from A are lost, bridge
B eventually transitions its port toward A to forwarding state and forwards
traffic. This creates a loop. If this failure exists at startup, the STP does
not converge correctly. In the case of a duplex mismatch, a reboot helps
temporarily; but in this case, a reboot of the bridges has absolutely no
In order to detect the unidirectional links before the creation of the
forwarding loop, Cisco designed and implemented the UniDirectional Link
Detection (UDLD) protocol. This feature can detect improper cabling or
unidirectional links on Layer 2 and automatically break resulting loops by
disabling some ports. Run UDLD wherever possible in a bridged
For more information on the use of UDLD, refer to the document
and Configuring the Unidirectional Link Detection Protocol
Packet corruption can also lead to the same kind of failure. If a link
has a high rate of physical errors, you can lose a certain number of
consecutive BPDUs. This loss can lead a blocking port to transition to
forwarding state. You do not see this case very often because STP default
parameters are very conservative. The blocking port needs to miss BPDUs for 50
seconds before the transition to forwarding. The successful transmission of a
single BPDU breaks the loop. This case commonly occurs with the careless
adjustment of STP parameters. An example of an adjustment is max-age
Duplex mismatch, bad cables, or incorrect cable length can cause packet
corruption. Refer to the document
Switch Port and Interface Problems for an explanation of CatOS and Cisco
IOS Software error counter output.
STP is implemented in software, even on high-end switches that perform
most of the switching functions in hardware with specialized
application-specific integrated circuits (ASICs). If for any reason there is an
overutilization of the CPU of the bridge, resources can be inadequate for the
transmission of BPDUs. The STA is generally not processor-intensive and has
priority over other processes. The Look for
Resource Errors section of this document provides some guidelines on the
number of instances of STP that a particular platform can handle.
PortFast is a feature that you typically enable only for a port or
interface that connects to a host. When the link comes up on this port, the
bridge skips the first stages of the STA and directly transitions to the
Caution: Do not use the PortFast feature on switch
ports or interfaces that connect to other switches, hubs, or routers.
Otherwise, you may create a network loop.
In this example, device A is a bridge with port p1 already forwarding.
Port p2 has a PortFast configuration. Device B is a hub. As soon as you plug
the second cable into A, port p2 goes to forwarding mode and creates a loop
between p1 and p2. This loop stops as soon as p1 or p2 receives a BPDU that
puts one of these two ports in blocking mode. But there is a problem with this
kind of transient loop. If the looped traffic is very intensive, the bridge can
have trouble with the successfully transmission of the BPDU that stops the
loop. This problem can delay the convergence considerably or bring down the
network in extreme cases.
For more information on the correct use of PortFast on switches that
run CatOS and Cisco IOS Software, refer to the document
PortFast and Other Commands to Fix Workstation Startup Connectivity
Even with PortFast configuration, the port or interface still
participates in STP. If a switch with a lower bridge priority than that of the
current active root bridge attaches to a PortFast-configured port or interface,
it can be elected as the root bridge. This change of root bridge can adversely
affect the active STP topology and can render the network suboptimal. To
prevent this situation, most Catalyst switches that run CatOS and Cisco IOS
Software have a feature with the name BPDU Guard. BPDU Guard disables a
PortFast-configured port or interface if the port or interface receives a
For more information on the use of the BPDU Guard feature on switches
that run CatOS and Cisco IOS Software, refer to the document
Tree Portfast BPDU Guard Enhancement.
An aggressive value for the max-age parameter and the forward delay can
lead to a very unstable STP topology. In such cases, the loss of some BPDUs can
cause a loop to appear. Another issue that is not well known relates to the
diameter of the bridge network. The conservative default values for the STP
timers impose a maximum network diameter of seven. This maximum network
diameter restricts how far away from each other bridges in the network can be.
In this case, two distinct bridges cannot be more than seven hops away from
each other. Part of this restriction comes from the age field that BPDUs
When a BPDU propagates from the root bridge toward the leaves of the
tree, the age field increments each time the BPDU goes though a bridge.
Eventually, the bridge discards the BPDU when the age field goes beyond maximum
age. If the root is too far away from some bridges of the network, this issue
can occur. This issue affects convergence of the spanning tree.
Take special care if you plan to change STP timers from the default
value. There is danger if you try to get faster reconvergence in this way. An
STP timer change has an impact on the diameter of the network and the stability
of the STP. You can change the bridge priority to select the root bridge, and
change the port cost or priority parameter to control redundancy and load
Cisco Catalyst software provides you with macros that finely tune the
most important STP parameters for you:
spantree root [secondary]
macro command decreases the
bridge priority so that it becomes root (or alternate root). An additional
option is available for this command that results in tuning of the STP timers
by specifying the diameter of your network. Even when correctly done, timer
tuning does not significantly improve the convergence time and introduces some
instability risks in the network. Also, this kind of tuning has to be updated
each time a device is added into the network. Keep the conservative default
values, which are familiar to network engineers.
command for CatOS or the
command for Cisco IOS Software increases the
switch priority so that the switch cannot be root. The command decreases the
STP convergence time in the event of an uplink failure. Use this command on a
distribution switch with dual connection to some core switches. Refer to the
and Configuring the Cisco UplinkFast Feature.
spantree backbonefast enable
command for CatOS or the
command for Cisco IOS Software can increase
the STP convergence time of the switch in the event of an indirect link
failure. BackboneFast is a Cisco proprietary feature. Refer to the document
and Configuring Backbone Fast on Catalyst
For more information on STP timers and the rules to tune them when
absolutely necessary, refer to the document
and Tuning Spanning Tree Protocol Timers.
As mentioned in the Introduction, the STP
is one of the first features that was implemented in Cisco products. You can
expect this feature to be very stable. Only interaction with newer features,
such as EtherChannel, has caused STP to fail in some very specific cases that
have now been addressed. A number of different factors can cause a software bug
and can have a number of different effects. There is no way to adequately
describe the issues that a bug can introduce. The most dangerous situation that
arises from software errors is if you ignore some BPDUs or, generally speaking,
you have a blocking port transition to forwarding.
Unfortunately, there is no systematic procedure to troubleshoot an STP
issue. However, this section sums up some of the actions that are available to
you. Most of the steps in this section apply to the troubleshooting of bridging
loops in general. You can use a more conventional approach to identify other
failures of the STP that lead to a loss of connectivity. For example, you can
explore the path that the traffic that experiences a problem takes.
Note: Most of these troubleshooting steps assume connectivity to the
different devices of the bridge network. This connectivity means you have
console access. During a bridging loop, for example, you probably cannot make a
If you have the output of a show-tech
support command from your Cisco device, you can use
(registered customers only)
to display potential issues and
Before you troubleshoot a bridging loop, you need to know these items,
The topology of the bridge network
The location of the root bridge
The location of the blocked ports and the redundant
This knowledge is essential for at least these two reasons:
In order to know what to fix in the network, you need to know how the
network looks when it works correctly.
Most of the troubleshooting steps simply use
show commands to try to identify error conditions.
Knowledge of the network helps you focus on the critical ports on the key
It used to be that a broadcast storm could have a disastrous effect on
the network. Today, with high-speed links and devices that provide switching at
the hardware level, it is not likely that a single host, for example, a server,
brings down a network through broadcasts. The best way to identify a bridging
loop is to capture the traffic on a saturated link and check that you see
similar packets multiple times. Realistically, however, if all users in a
certain bridge domain have connectivity issues at the same time, you can
already suspect a bridging loop.
Check the port utilization on your devices and look for abnormal
values. Refer to the Check Port Utilization
section of this document.
On the Catalyst switches that run CatOS, you can easily check the
overall backplane usage with the
command. The command provides the current usage of
the switch backplane and also specifies the peak usage and date of peak usage.
An unusual peak utilization shows you whether there has ever been a bridging
loop on this device.
Bridging loops have extremely severe consequences on a bridge network.
Administrators generally do not have time to look for the cause of the loop and
prefer to restore connectivity as soon as possible. The easy way out in this
case is to manually disable every port that provides redundancy in the network.
If you can identify a part of the network that is affected most, begin to
disable ports in this area. Or, if possible, initially disable ports that
should be blocking. Each time you disable a port, check to see if you have
restored connectivity in the network. By identifying which disabled port stops
the loop, you also identify the redundant path where this port is located. If
this port should have been blocking, you have probably found the link on which
the failure appeared.
If you cannot precisely identify the source of the problem, or if the
problem is transient, enable the logging of STP events on the bridges and
switches of the network that experiences the failure. If you want to limit the
number of devices to configure, at least enable this logging on devices that
host blocked ports; the transition of a blocked port is what creates a
You can also try to send the debug output to a syslog device.
Unfortunately, when a bridging loop occurs, you seldom maintain connectivity to
a syslog server.
The critical ports to investigate first are the blocking ports. This
section provides a list of what to look for on the different ports, with a
quick description of the commands to issue for switches that run CatOS and
Cisco IOS Software.
Especially on blocked ports and root ports, check that you receive
BPDUs periodically. Several issues can lead to a port failure to receive
packets or BPDUs.
Cisco IOS Software—In Cisco IOS Software Release 12.0 or later,
output of the
spanning-tree bridge-group #
command has a BPDU field. The field shows you
the number of BPDUs received for each interface. Issue the command an
additional one or two times to determine if the device receives BPDUs.
If you do not have the BPDU field in
the output of
command, you can enable STP debug with
command to verify the receipt of
command tells you
the number of multicast packets that a specific port receives. But the simplest
command to use is the
spantree statistics module#/port#
command. This command displays the exact
number of configuration BPDUs that a specific port received, on a specific
VLAN. A port can belong to several VLANs, if trunking. See the
An Additional CatOS Command section of this
To look for a duplex mismatch, you must check each side of the
An interface with traffic overload can fail to transmit vital BPDUs. A
link overload also indicates a possible bridging loop.
Cisco IOS Software—Use the command
to determine the utilization on an interface.
Several fields help you with this determination, such as
load and packets
input/output. Refer to the document
Switch Port and Interface Problems for an explanation of the
show interfaces command output.
statistics about packets that a port receives and sends. The
command automatically evaluates the port utilization
over a 30-second period and displays the result. The command classifies the
results by percentage bandwidth utilization, although other options for results
classification are available. Also, the
command gives an indication of the backplane
utilization, even though the command does not point to a specific
Cisco IOS Software—Look for error increments in the
input errors counter of the
command. The error counters include
runts, giants, no buffer, CRC, frame, overrun,
and ignored counts.
Refer to the document
Switch Port and Interface Problems for an explanation of the
show interfaces command output.
gives you some
details with the Align-Err, FCS-Err, Xmit-Err,
Rcv-Err, and Undersize fields.
provides statistics in even more detail.
spantree statistics module#/port#
gives very accurate information about a
specific port. Issue this command on ports that you suspect and pay special
attention to these fields:
Forward trans count—This counter
remembers how many times a port transitions from learning to forwarding. In a
stable topology, this counter always shows 1. This counter resets to 0 as the
port goes down and up. So, a value that is higher than 1 indicates that the
transition experienced by the port is the result of an STP recalculation. The
transition is not the result of a direct link failure.
Max age expiry count—This counter
tracks the number of times that the max age expired on this link. Basically, a
port that expects BPDUs waits for max age before the port considers the
designated bridge to be lost. The max age default is 20 seconds. Each time this
event occur, the counter increments. When the value is not 0, it indicates that
the designated bridge for this LAN is unstable or has a problem with the
transmission of BPDUs.
A high CPU utilization can be dangerous for a system that runs the STA.
Use this method to check that the CPU resource is adequate for a device:
There is a limitation on the number of different instances of STP that
a Supervisor Engine can handle. Ensure that the total number of logical ports
across all instances of STP for different VLANs does not exceed the maximum
number supported for each Supervisor Engine type and memory
command for switches that run CatOS or
spanning-tree summary totals
command for switches that run
Cisco IOS Software. These commands display the number of logical ports or
interfaces per VLAN in the STP Active column.
The total appears at the bottom of this column. The total represents the sum of
all logical ports across all instances of STP for the different VLANs. Make
sure that this number does not exceed the maximum number supported for each
Supervisor Engine type.
Note: The formula to compute the sum of logical ports on the switch
(number of non-ATM trunks * number of active Vlans on that trunk)
+ 2*(number of ATM trunks * number of active Vlans on that trunk)
+ number of non-trunking ports
For a summary of the restrictions for STP that apply to Catalyst
switches, refer to these documents:
Troubleshooting is a matter of identifying what is currently wrong in
the network. Disable as many features as possible. The disablement helps
simplify the network structure and eases the identification of the problem. For
example, EtherChanneling is a feature that requires STP to logically bundle
several different links into a single link; the disablement of this feature
during troubleshooting makes sense. As a general rule, making the configuration
as simple as possible makes troubleshooting the problem easier.
show processes cpu
Very often, information about the location of root is not available at
troubleshooting time. Do not leave the STP to decide which bridge is root. For
each VLAN, you can usually identify which switch can best serve as root. This
depends on the design of the network. Generally, choose a powerful bridge in
the middle of the network. If you put the root bridge in the center of the
network with direct connection to the servers and routers, you generally reduce
the average distance from the clients to the servers and
This diagram shows:
If bridge B is root, link A to C is blocked on bridge A or bridge C.
In this case, hosts that connect to switch B can access the server and the
router in two hops. Hosts that connect to bridge C can access the server and
the router in three hops. The average distance is two and one-half
If bridge A is root, the router and the server are reachable in two
hops for both hosts that connect on B and C. The average distance is now two
The logic behind this simple example transfers to more complex
Important Note: For each VLAN, hard code the root
bridge and the backup root bridge with a reduction in the value of the STP
priority parameter. Or you can use the
Plan the organization of your redundant links. Forget about the
plug-and-play feature of the STP. Tune the STP cost parameter to decide which
ports block. This tuning is usually not necessary if you have a hierarchical
design and a root bridge in a good location.
Important Note: For each VLAN, know which ports should
be blocking in the stable network. Have a network diagram that clearly shows
each physical loop in the network and which blocked ports break the
Knowledge of the location of redundant links helps you identify an
accidental bridging loop and the cause. Also, knowledge of the location of
blocked ports allows you to determine the location of the error.
The only critical action that STP takes is the blocking of the ports. A
single blocking port that mistakenly transitions to forwarding can melt down a
large part of the network. A good way to limit the risk inherent in the use of
the STP is to reduce the number of blocked ports as much as possible.
You do not need more than two redundant links between two nodes in a
bridge network. However, this kind of configuration is
Distribution switches are dual-attached to two core switches. Users
that connect on distribution switches are only in a subset of the VLANs
available in the network. In this example, users that connect on Dist 2 are all
in VLAN 2; Dist 3 only connects users in VLAN 3. By default, trunks carry all
the VLANs defined in the VLAN Trunk Protocol (VTP) domain. Only Dist 2 receives
unnecessary broadcast and multicast traffic for VLAN 3, but it is also blocking
one of its ports for VLAN 3. The result is three redundant paths between Core A
and Core B. This redundancy results in more blocked ports and a higher
likelihood of a loop.
Important Note: Prune any VLAN that you do not need
off your trunks.
VTP pruning can help, but this kind of plug-and-play feature is not
necessary in the core of the network.
In this example, only an access VLAN is used to connect the
distribution switches to the core:
In this design, only one port is blocked per VLAN. Also, with this
design, you can remove all redundant links in just one step if you shut down
Core A or Core B.
Layer 3 switching means routing approximately at the speed of
switching. A router performs two main functions:
A router builds a forwarding table. The router generally exchanges
information with peers by way of routing protocols.
A router receives packets and forwards them to the correct interface
based on the destination address.
High-end Cisco Layer 3 switches are now able to perform this second
function, at the same speed as the Layer 2 switching function. If you introduce
a routing hop and create an additional segmentation of the network, there is no
speed penalty. This diagram uses the example in the section
Prune VLANs That You Do Not Use as a
Core A and Core B are now some Layer 3 switches. VLAN 2 and VLAN 3 are
no longer bridged between Core A and Core B, so there is no possibility for an
Redundancy is still present, with a reliance on Layer 3 routing
protocols. The design ensures a reconvergence that is even faster than
reconvergence with STP.
There is no longer any single port that the STP blocks. Therefore,
there is no potential for a bridging loop.
There is no speed penalty, as leaving the VLAN by Layer 3 switching
is as fast as bridging inside the VLAN.
There is a single drawback with this design. Migration to this kind of
design generally implies a rework of the addressing scheme.
Even if you have succeeded with the removal of all the blocked ports
from your network and you do not have any physical redundancy, do not disable
STP. STP is generally not very processor-intensive; packet switching does not
involve the CPU in most Cisco switches. Also, the few BPDUs that are sent on
each link do not significantly reduce the available bandwidth. However, a
bridge network without STP can melt down in a fraction of a second if an
operator makes an error on a patch panel, for example. Generally, disabling the
STP in a bridge network is not worth the risk.
A Cisco switch typically has a single IP address that binds to a VLAN,
known as the administrative VLAN. In this VLAN, the switch behaves like a
generic IP host. In particular, every broadcast or multicast packet is
forwarded to the CPU. A high rate of broadcast or multicast traffic on the
administrative VLAN can adversely impact the CPU and the CPU ability to process
vital BPDUs. Therefore, keep user traffic off the administrative VLAN.
Until recently, there was no way to remove VLAN 1 from a trunk in Cisco
implementation. VLAN 1 generally serves as an administrative VLAN, where all
switches are accessible in the same IP subnet. Though useful, this setup can be
dangerous because a bridging loop on VLAN 1 affects all trunks, which can bring
down the whole network. Of course, the same problem exists no matter which VLAN
you use. Try to segment the bridging domains with use of high-speed Layer 3
As of CatOS version 5.4 and Cisco IOS Software Release 12.1(11b)E, you
can remove VLAN 1 from trunks. VLAN 1 still exists, but it blocks traffic,
which prevents any loop possibility.