Guest

Cisco IOS Software Releases 12.1 Mainline

Field Notice: Endless BGP Convergence Problem in Cisco IOS Software Releases


October 10, 2000


Products Affected

Product

Comments

All Cisco IOS? Software Releases

Affects Border Gateway Protocol (BGP) users only.

Problem Description

A few of Cisco's customers have recently noticed a problem with their BGP convergence. The problem is best described as "an endless BGP convergence problem," where BGP enters a repeating cycle of sending and receiving updates and/or withdraws. This results in a BGP network that churns and never completely converges.

Background

A BGP Autonomous System (AS) may be affected if it meets all of the following criteria:

  • Uses either route reflectors in a single tier, or confederations in a single level. It does not use a hierarchical design.

  • Accepts multi-exit discriminators (MEDs) from more than one AS for a prefix.

  • Does not use the bgp deterministic med command.

  • Does not follow the Deployment Guidelines for route reflectors leavingcisco.com or confederations leavingcisco.com.

This endless cycling error happens as a result of the following:

  • Not following the Deployment Guidelines that are documented in the Route Reflector Requests for Comments (RFC) and in the Autonomous System Confederations for BGP draft.

  • Not using the bgp deterministic med command.

If your network meets this description, then you need to make changes to your network so that it conforms with the BGP standards.

Problem Symptoms

Identifying the Problem

The more prefixes you are receiving that meet the above criteria, the more churn you will have in your network if you have not taken steps to comply with the route reflector and confederations standards and have not enabled the bgp deterministic med command. To see the churn in action in your network, implement the following:

  1. Execute a show ip route bgp | include, 00:00 command every 60 seconds for five minutes. This command will show you all of the routes that have been modified within the past 60 seconds. If you see a prefix that shows up in the output of this command more than three times, then there is a chance that prefix is the endless convergence cycle.

    Example output:

    Router# show ip route bgp | include , 00:00
    B 2.6.4.0/22 [200/1] via 8.3.4.18, 00:00:58
    B 3.1.4.0/24 [200/1] via 6.2.0.9, 00:00:28
    B 3.8.8.0/22 [200/1] via 7.5.2.5, 00:00:58
    B 3.8.6.0/23 [200/1] via 7.5.2.5, 00:00:58
    B 3.9.4.0/24 [200/200000000] via 5.1.4.9, 00:00:24
    Router#
    
  2. Execute a show ip bgp a.b.c.d | include best # command for one of the prefixes that you saw repeating in step 1. For this example, assume that the 3.8.6.0/23 route showed up three or more times in the output of step 1. We want to watch this prefix for several minutes to see if the best path changes for this prefix are in a repeating cycle. Run the show ip bgp a.b.c.d | include best # command over and over for several minutes to determine if this path is in a convergence loop.

    Example output:

    Currently, our best path is #17.

    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #17)
    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #17)
    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #17)
    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #17)
    

    Then, the best path changes to #14.

    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #14)
    

    Next, the best path changes to #18.

    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (24 available, best #18)
    

    Now, the best path is #17 again.

    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #17)
    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #17)
    Router# show ip bgp 3.8.6.0 | include best #
    Paths: (23 available, best #17)
    

If we continue to run the show ip bgp 3.8.6.0 | include best # command for several minutes, we may see the best path change from 17 to 14, from 14 to 18, and from 18 back to 17 several times. If we do see this repetition, then it is safe to say the 3.8.6.0/23 route is in an update loop.

Even if you are unable to find a prefix that is in the update loop, you should still make sure that your network design conforms to the BGP standards and is using the bgp deterministic med command.

Example of the Endless BGP Convergence Problem in a Route Reflector Environment

Given the following topology and the following set of paths for a prefix (10.0.0/8 in this example), BGP will enter an update loop that never ends.

fn12942-01.gif

In the above figure:

The (1) represents the Interior Gateway Protocol (IGP) cost from rr_C to RR_D.

The RR_C is a route reflector for rrc_A and rrc_B. These three routers are in cluster 13.0.0.1.

The RR_D is a route reflector for rrc_E. These two routers are in cluster 12.0.0.1.

The AS, MED, and IGP values shown at the bottom of the above figure represent the updates advertised by rrc_A, rrc_B, and rrc_E for the 10.0.0.0/8 network. The 10.0.0.0/8 network is originated by AS 100, which is not shown on this diagram.

AS: = The AS_PATH of the update

MED: = The MED of the update

IGP: = The IGP cost to the BGP NEXT_HOP from RR_C or RR_D. For example, the IGP cost to 14.0.0.1, which is a BGP NEXT_HOP, is ~5 or "around 5." In other words, it may be 5 for RR_C, but it may be 6 for RR_D.

Note:  For the following steps 1 through 5, the best path will be marked with an asterisk (*).

  1. The RR_C has the following in its BGP table for 10.0.0.0/8 with the "10 100" route marked as best:

    AS MED IGP cost to next-hop
    6 100 1 4
    * 10 100 10 5
    

    The "10 100" path should not be marked as best, but this is not the cause of the update loop.

    The router realizes it has the wrong route marked as best since the "6 100" path has a lower IGP metric. The RR_C makes this change and sends an update to its neighbors to let them know it considers the "6 100, 1, 4" route as best.

  2. The RR_D receives the update from RR_C, which leaves RR_D with the following in its BGP table:

    AS MED IGP cost to next-hop
    * 6 100 0 12
    6 100 1 4
    

    The RR_D then marks the "6 100, 0, 12" path as best because it has a lower MED. The RR_D then sends an update to his neighbors to let them know that this is its best path.

  3. The RR_C receives the update from RR_D. The RR_C now has the following in its BGP table:

    AS MED IGP cost to next-hop
    6 100 0 12
    6 100 1 4
    * 10 100 10 5
    

    Path 1 beats path 2 because of lower MED, and then path 3 beats path 1 because of the IGP next-hop. The RR_C sends an update to its peers to let them know this is its best path.

  4. The RR_D receives the update from RR_C, which leaves RR_D with the following in its BGP table:

    AS MED IGP cost to next-hop
    6 100 0 12
    * 10 100 10 5
    

    The RR_D selects the "10 100, 10, 5" route as best because of the IGP metric. The RR_D sends an update/withdraw to its peers to let them know this is its best path.

  5. The RR_C receives the withdraw from RR_D, which leaves RR_C with the following in its BGP table:

    AS MED IGP cost to next-hop
    6 100 1 4
    * 10 100 10 5
    

    The RR_C received a withdraw for "6 100, 0, 12", which changes what is considered the best path for RR_C. We do not recompute the best path for a prefix when we receive a withdraw, unless our best path was withdrawn. This is why RR_C has the "10 100, 10, 5" path selected as best, even though the "6 100, 1, 4" path is better.

    At this point, we have made a full loop and are back to step 1. The router realizes it is using the incorrect best path, and the cycle repeats.

Example in a Confederations Environment

Given the following topology and the following set of paths for a prefix (10.0.0/8 in this example), BGP will enter an update loop that never ends.

fn12942-02.gif

The (1) represents the IGP cost from router C to router D.

The AS, MED, and IGP values shown at the bottom represent the updates advertised by routers A, B, and E for the 10.0.0.0/8 network. The 10.0.0.0/8 network is originated by AS 100, which is not shown on this diagram.

AS: = The AS_PATH of the update.

MED: = The MED of the update.

IGP: = The IGP cost to the BGP NEXT_HOP from router C or D. For example, the IGP cost to 14.0.0.1, which is a BGP NEXT_HOP, is ~3 or "around 3." In other words, it may be 5 for 3, but it may be 4 for D.

Routers A, B, and C are in Sub AS 65000. There are no route reflectors, so routers A, B, and C are in a full mesh.

Routers D and E are in Sub AS 65001. They are regular internal BGP (IBGP) peers.

  1. Router C has the following in its BGP table:

    AS MED IGP cost to next-hop
    * 10 100 10 3
    (65001) 6 100 0 6
    6 100 1 2
    

    The "10 100" path is selected as best and advertised to Router D.

  2. Router D has the following in its BGP table:

    AS MED IGP cost to next-hop
    6 100 0 5
    * (65000) 10 100 10 4
    

    The "(65000) 10 100" route is selected as best. As a result, Router D sends a withdraw to Router C for the "6 100" route that it had previously advertised.

  3. Router C receives the withdraw from Router D.

    Router C now has the following in its BGP table:

    AS MED IGP cost to next-hop
    * 10 100 10 3
    6 100 1 2 
    
    

    Router C received a withdraw for "(65001) 6 100," which changes what is considered the best path for Router C. The router does not recompute the best path for a prefix when it receives a withdraw, unless its best path was withdrawn. This is why Router C has the "10 100, 10, 3" path selected as best, even though the "6 100, 1, 2" path is better.

  4. Router C realizes that the "6 100" path is better because of lower IGP metric. Router C sends a withdraw to Router D for the "10 100" network since Router C is now using the "6 100" network as its best path.

    AS MED IGP cost to next-hop
    10 100 10 3
    * 6 100 1 2
    
  5. Router D receives the update from Router C.

    Router D now has the following in its BGP table:

    AS MED IGP cost to next-hop
    (65000) 6 100 1 3
    * 6 100 0 5
    

    Router D selects the "6 100, 0, 5" path as best because of MED. Router D sends an update to Router C.

  6. Router C receives the update from Router D.

    Router C now has the following in its BGP table:

    AS MED IGP cost to next-hop
    * 10 100 10 3
    (65001) 6 100 0 6
    6 100 1 2
    

    At this point, we have made a full cycle and are back to step 1. Workaround/Solution

Workaround/Solution

Solutions

To solve the update loop problem, use one of the six choices below, and enable the bgp deterministic med command. The preferred choices are Options 1 or 2.

  1. Use confederations, and make the inter-sub-as links have a higher IGP metric than the intra-sub-as IGP metrics.

  2. Use route reflectors where clusters are clearly defined, and make sure any inter-cluster link (including links from a route reflector cluster to a route reflector in another cluster) has a higher IGP metric than the intra-cluster IGP metrics.

  3. Do not accept MEDs all together.

  4. Modify policies so that the BGP decision algorithm never gets to the MED step. The most feasible way to do this would be to modify LOCAL_PREF to force the decision to be made there. This also defeats MEDs, so Option 3 may be a better choice.

  5. Enable the bgp always-compare-med command.

    Note: This partially defeats the purpose of MEDs, so Option 3 may be a better choice.

Use a full IBGP Mesh. This is not a feasible solution for an AS with a large number of BGP routers, but it may work fine for smaller networks.

Verification

When you have modified your network as described in the Solutions section, you will no longer have a BGP update loop. To verify this, repeat the steps described in the Identifying the Problem section of this document, and you will see that the loop has been fixed.

Other Considerations

RFC 2796: BGP Route Reflection - April 2000

`
Following the Specifications

This update loop condition can be avoided in route reflector and confederation environments, if the standard for each technology is followed, and if the bgp deterministic med command is enabled throughout the AS.

Please refer to Section 9 in the RFC 2796 BGP Route Reflection April 2000 leavingcisco.com for more information.

Confederations Standard

The latest confederations leavingcisco.com standard has a similar requirement. Section 11 of this statement describes the update loop condition that we saw in our example. The draft also states that one solution for this problem is to increase the inter-sub-AS IGP metrics so that they are higher than the intra-sub-AS IGP metrics.

The network in our example was violating this rule because our inter-sub-AS IGP metric was 1, but our highest intra-sub-AS IGP metric was 6. To conform with the confederation standard, our inter-sub-AS IGP metric needs to be at least 7. It would probably be best to make the inter-sub-AS IGP metric higher than 7, in case the highest intra-sub-AS IGP metric increases above 6.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.