Graph Overlays on Path Vector: A Possible Next Step in BGP - The Internet Protocol Journal - Volume 8, Number 2

by Russ White, Cisco Systems

Over the past several years, much research and thought has gone into a replacement for the current interdomain routing protocol,
Border Gateway Protocol
(BGP) [1]. For instance:

In 2002, the Internet Research Task Force (IRTF) published a set of requirements for a next-generation interdomain routing protocol. In fact, several sets of requirements documents have been published in this area.
In December 2001, The Cook Report noted that BGP needs to be replaced
In October 2003, the Workshop on Internet Routing Evolution and Design (WIRED) presented papers arguing that BGP needs to be replaced.
In December 2001, the IETF published RFC 3221 [2], authored by Geoff Huston, which provided some background information toward finding a replacement for BGP.

There are probably thousands of references in magazine articles, conference proceedings, and research papers, all stating that BGP should be replaced. Of course, all these discussions wind up at the same place: It is almost impossible to replace BGP, wholesale, in the public Internet, or even in any of the private networks running BGP today.

The basic problem is you cannot take the network down, and you cannot replace the routing protocol without taking the network down. Many very clever ideas have been proposed to get around this problem—complex transition schemes, moving partitions, and all sorts of other concepts. But, in the end, the idea of transitioning from one routing protocol to another on something as large—and as distributed in both geography and ownership—as the Internet, has been a hard wall against which all the proposals for new interdomain routing protocols pile up.

"Another approach is to consider the feasibility of decoupling the requirements of inter-domain connectivity management with the applications of policy constraints and the issues of sender- and receiver-managed traffic-engineering requirements. Such an approach may use a link-state protocol as a means of maintaining a consistent view of the topology of inter-domain network, and then use some form of overlay protocol to negotiate policy requirements of each Autonomous System, and use a further overlay to support inter-domain traffic-engineering requirements."
In this article, we propose building on this concept, but in a novel way: rather than replacing BGP, or attempting to solve all the currently perceived problems with BGP at once, we attempt to address two problems in a way that does not heavily modify day-to-day BGP operation. Rather than replace BGP, enhance it to account for new requirements by providing new capabilities. If done right, this avoids the problem of deploying a new routing protocol altogether, because BGP is already deployed throughout the Internet.

Problems with BGP

No discussion of replacing BGP would be complete without a discussion of why so many people think BGP needs to be replaced. We need to consider three main points in this area:
convergence speed
, and
. Each of these is covered in the following sections.

BGP Convergence Speed

Through various studies, and through examining the way in which BGP works, it has been shown that BGP, in an interdomain environment, always converges roughly in:

(Maximum AS_PATH – Minimum AS_PATH) × Minimum Advertisement Interval
To understand why this is so, let's examine the following small internetwork as it converges.

Let's assume autonomous system (AS) 12 is advertising some destination,, and that every other autonomous system in the internetwork chooses the path to the right to reach that destination. So, for instance, AS4 chooses the path {5,8,11,12} to reach, AS3 chooses the path {7,10,12} to reach, AS2 chooses the path {9,12} to reach, and AS6 chooses the path {12} to reach

At this point, let's examine what happens if AS12 loses its connection to AS12 sends out a withdraw, which reaches AS6, 9, 10, and 11 at about the same time. These autonomous systems then send out withdraws, with the second set of withdraws reaching AS1, 7, and 8 at about the same time.

When AS1 receives this first withdraw, it examines its local table, and finds the next best path to reach is through AS2, with the path {2,9,12}. AS1 does not realize that AS2 has received a withdraw for at the same time it received the first withdraw for this destination from AS6. So, AS1 switches over to its next best path, and continues forwarding traffic to

AS2, 7, and 8 now also send withdraws to each of their peers, including AS1, 3, and 5. AS1 now receives another withdraw, again for the path it is currently using to reach AS1 examines its local tables and finds it has another path, through {3,7,10}, to, so it switches to that path, without knowing AS3 has just received a withdraw for this same path. AS3 and 5 now send withdraws to each of their peers, AS1 and 4. AS1 has again received a withdraw from the peer it is using to reach, so it examines its local tables, and finds it still has a path through {4,5,8,11,12} to reach this destination. It switches to this path, without realizing AS4 has just received a withdraw as well.

AS4 now sends the final withdraw to AS1, removing AS1's final path from its local tables. AS1 now removes all reachability information for, and the network is converged. Note that the actual convergence in this situation would be a bit more complicated, with AS1 sending updates at each stage, and all the other autonomous systems reconverging at each step along the way, but we have used only the simplest set of messages through the network, to illustrate the basic procedure BGP follows when converging.

This short example illustrates why BGP has the convergence characteristics described previously. BGP "hunts" through each possible autonomous-system path, from shorter ones to longer ones, until it finally converges. The rate at which it can hunt through each possible autonomous-system path is determined by the minimum advertisement interval, the rate at which new routing information is allowed to flow through the system.

This problem has several obvious solutions. The first is to simply increase the rate at which routing information flows through the system, by reducing the minimum advertisement interval. But, this plays against route flap dampening, and network stability in general, so, beyond some lower possible bound, reducing the minimum advertisement interval is not possible (without further modifications to BGP).

Another obvious solution is to simply add a "reason code" to the original withdraw. If AS12 originally stated it was withdrawing reachability to because it had lost local connectivity to it, then all paths with AS12 in the path could have been discarded immediately, at the first step. The problem here is making certain the original withdraw message actually makes it through the network, from AS12 all the way to AS1. Because BGP is a very efficient protocol, many control messages of this type are actually removed from the network, through implicit withdraws, aggregation, and other mechanisms.


The second problem we encounter with BGP is its rather rough sense of policy. For instance, let's examine the following small network, and look at one specific example of where policy transmission and enforcement are problematic in BGP.

Here AS2 has a policy that AS3 should never be used for transit. In other words, traffic originated in AS4 should always pass through the large internetwork rather than through AS3 to reach AS1. This type of situation is very common in the public Internet, such as when AS3 is actually AS2's customer. How can AS2 communicate this policy to AS4, however?

AS2 could simply mark the routing information it sends to AS3 so AS3 cannot readvertise it to AS4, but this is problematic. Simple mechanisms, such as marking the routes with the NO_EXPORT community, are easy for AS3 to simply strip off the routing information it receives. We could conceive of some way to cryptographically sign the included policy, so AS3 cannot disturb the policy and AS4 can see the policy when it receives the information from AS3, but this is problematic as well.

Suppose AS3 is receiving aggregated routing information directly from AS5, which includes some of the same destinations AS2 has advertised to AS3, but has blocked AS3 from advertising to AS4. AS3 could, conceivably, readvertise this routing information to AS4, and AS4 could prefer this shorter prefix aggregate to reach the destinations in AS1, rather than the paths through the large internetwork. AS4 would then forward traffic to AS3, which would then rely on its longer prefix routes, received from AS2, to forward this traffic to these destinations in AS1. AS3 is, contrary to AS2's policy, transiting traffic through AS2 to AS1. There is no simple answer to this problem.


It has been widely acknowledged that BGP is an insecure protocol, with many areas where attackers can hijack, inject false routing information, and perform other attacks. The IETF's
Routing Protocols Security
(RPsec) working group is working on a set of documents describing vulnerabilities of BGP, and creating recommendations for systems to secure BGP.

What sort of requirements are likely to come out of such an undertaking?

Any proposed mechanism must be able to show that a specific autonomous system is authorized to originate specific routing information.
Any proposed mechanism must be able to show that the AS Path carried in received routing information corresponds to a real path in the internetwork, beginning with the origin AS and ending in the advertising peer.

There will be many other requirements that proposed mechanisms for providing security for BGP will need to, or should, meet, but these two will be the largest areas of concern for our purposes.

Solving the Problems

Now that we have an idea of the three areas we want to solve problems in, how can we actually solve them? The most elegant solution would be a single mechanism that does not change the current semantics of BGP itself too greatly, would provide greater benefits as it is deployed throughout a large-scale internetwork, and would rely on existing—and understood—techniques within routing.

One perfect example of such a mechanism would be to simply overlay a link state-like graph of interconnectivity over the BGP protocol. This graph would provide information about the interconnections between autonomous systems, rather than between routers, and would be used to convey information about the topology and policies in the internetwork, rather than to find loop-free paths through the internetwork.

Let's go back through our three examples, and see how overlaying an internetwork connection graph would be able to solve some of the problems currently facing BGP.

Convergence Speed

Looking at our small sample internetwork again:

What is the one thing we said would resolve the problems with BGP hunting through every possible longer autonomous-system path alternative to finally converge around loss of reachability to Could AS12, somehow, communicate directly to every autonomous system in the internetwork that it has directly lost this connection, rather than waiting for AS1 to try every possible path to, and discover each one, in turn, withdrawn?

If we had a topological graph of the network, AS12 could simply remove from its connectivity information. AS12 would then flood this information, on an interdomain basis, to all the other autonomous systems in the internetwork at roughly the same time. Thus, in the worst case, AS1 would receive this information at about the same time it received the first withdraw for, from AS6.

When AS1 receives this updated topology information from AS6, it will discover that AS12 is no longer connected to, and, therefore, it can remove every possible path to containing AS12. This would allow AS1 to remove the paths {2,9,12}, {3,7,10,12}, and {4,5,8,11,12} at the same time. The internetwork now converges as soon as AS1 computes the new connectivity graph, and acts on it by examining each entry in its local tables and discarding the ones with AS12 in the autonomous-system path.

We have not changed the way BGP finds paths through the network—the path still is not valid unless we receive an advertisement from our connected peers. We have also not changed the format of any BGP updates, any peering state machines, or anything else. We have simply overlayed an interconnection graph on top of the current protocol mechanisms, which we can use to our advantage to speed up network convergence.

What about partial deployments in this situation? Suppose only autonomous systems 6, 7, 8, 9, 10, 11, and 12 are running this new extension. Would it still help us to speed up network convergence? When AS12 withdraws, AS6, 7, 8, 9, 10, and 11 would immediately discard any routes passing through AS12 to reach At this point, they could each withdraw those routes, meaning AS1, 2, 3, and 5 would all receive a withdraw at about the same time. This short-circuits the number of possible paths for AS1 to hunt through, decreasing the amount of time the internetwork takes to converge. Even without a full deployment, we see some positive impact from this new technique.


Let's examine our policy problem after placing our interconnection graph on top of the internetwork.

Here, we see that AS2 could actually place its policy for AS3 not to transit traffic in the interconnection draft. AS4 would then be able to independently verify what AS2's policy toward AS3 transiting traffic is. AS4 could then examine the routing information it receives from AS3, and determine if it should install—or not install—routing information received from AS3, based on this policy.

Objections to an Interconnection Graph

When a link-state protocol has been proposed as a possible replacement for BGP in the past, two primary objections have been raised:

Providers are reluctant to accept the wholesale replacement of a known working system with a new one.
Many providers wish to hide their policies and connectivity to other providers or customers for policy reasons.

This article does not propose replacing BGP, just augmenting it, so the first argument is, to some degree, not valid against this approach. The second objection, that of using a link-state protocol for interdomain routing specifically, also does not apply, because we are not proposing changing the way BGP finds loop-free paths through the network. The proposed interconnection graph is not used for finding paths through the network, it is used only for faster signaling of path failure (by shortcircuiting the slower withdraw mechanisms), and for providing a place to hang policy and security information.

Concentrating on a few smaller spaces allows us to design a smaller solution set that can be incrementally deployed in a simple way.

The second objection is harder to meet, simply because the concepts of policy within a routing system are hard to define and understand in all possible cases or respects. In fact, there are policy requirements not met by BGP today, but rather are met through contracts, packet filters, and other mechanisms (even sometimes by violating the BGP specification).

Consider two facts about this proposal that work around many of the specific objections we have heard in this area:

The interconnection graph can be partial, in different parts of the internetwork. For instance, a given service provider might provide different views of who they are connected to to different peers, depending on their policy of revealing this information.
The interconnection graph only contains autonomous system-level connectivity information, not specific peering-point information. For instance, two autonomous systems may be connected in a large number of places, or as few as one. The interconnectivity graph does not care about such details, only whether at least one connection exists. Such an interconnectivity graph would not reveal actual connection points between peering autonomous systems, how rich that connectivity is, nor any other information about the business relationship between the two peers.

In fact, the types of interconnectivity information an interconnection graph could provide is already available by examining the autonomous-system paths of routes retrievable from various route view servers. Some mechanism would be required to collate this information into a usable graph, but a good deal of current research on the scaling and convergence properties of large-scale internetworks actually depends on the ability to build an interconnection graph before beginning any other work, so mechanisms to collate this data already exist, and are in use today.


The internetwork interconnection graph can actually show whether a path exists from the origin to the advertising peer, through
signed certificates
. For example, soBGP uses this specific mechanism to validate the autonomous-system path carried in received routing information. Other research is currently being pursued in this area as well.


We have proposed a single step forward that could be used to resolve some of the problems facing BGP in the near term, and possibly provide the networking community with a path forward on other fronts as well. The concept of simply making incrementally deployable changes to BGP to solve pressing problems can provide us with options outside the normal lines of thinking: either making very small changes to BGP, making BGP more and more complicated, or simply replacing the BGP protocol, with all the deployment problems this would entail.


[1] Yakov Rekhter, Tony Li, "A Border Gateway Protocol 4 (BGP-4),"
, March 1995.

[2] Geoff Huston, "Commentary on Inter-Domain Routing in the Internet,"
, December 2001.

RUSS WHITE works for Cisco Systems in the Routing Protocols Deployment and Architecture (Cisco DNA) team in Research Triangle Park, North Carolina. He has worked in the Cisco Technical Assistance Center (TAC) and Escalation Team in the past, has coauthored several books on routing protocols, including
Advanced IP Network Design, IS–IS for IP Networks
, and
Inside Cisco IOS Software Architecture
. He is currently in the process of publishing a book on BGP deployment, and is the co-chair of the Routing Protocols Security Working Group within the IETF. E-mail: