The Internet Protocol Journal, Volume 13, No.1

Extending Router Lifetime with Virtual Aggregation

by Paul Francis, Max Planck Institute for Software Systems, and Xiaohu Xu, Huawei Technologies

Biologists believe that human life is limited by the number of times cells can replicate; noncancerous cells have a kind of internal counter that prevents them from replicating forever. Even if humans are kept healthy in every respect, they will eventually die simply because their cells will cease to replicate. Internet routers also have a finite lifetime. They are built with a fixed amount of hardware memory for storing the forwarding table (the memory structure that tells the router where to forward any IP packet, also called the Forwarding Information Base [FIB]). As the Internet global routing table grows, it eventually overflows the FIB, and the router ceases to be able to hold the full routing table. Even if the router is healthy in every respect (all of its hardware components still operate), it can no longer function as a router in the Internet Default-Free Zone (DFZ), where no default routes can be used.

In the past, router vendors have been reasonably good at predicting how long FIBs will last because the growth of the global DFZ routing table has stayed fairly predictable. As a result, Internet Service Providers (ISPs) can plan their capital budgets, and where necessary use a set of tricks (discussed in the next section) to squeeze additional life out of routers even after their "FIB death." But there are two problems.

First, these tricks work only in limited situations, they require extra configuration, and they can lead to increased traffic loads. Second, and potentially much more serious, the rate of routing table growth may dramatically accelerate in the near future, thus shrinking the lifetime of the installed router base. This expected acceleration is due to the imminent exhaustion of IPv4 addresses. In the past, address authorities such as the American Registry for Internet Numbers (ARIN) could assign large contiguous blocks of addresses to ISPs, which in turn assigned smaller blocks to their customers. Therefore, routers in other ISPs' networks need only a single routing table entry—that of the large block—to route to destinations in the ISP. This approach to scaling is called address aggregation. There is a fear that, as IPv4 addresses become increasingly unavailable, ISPs will start buying and selling smaller and smaller blocks of IP addresses from each other in an effort to squeeze out as many addresses as possible. These small blocks will appear all over the Internet thus significantly increasing the size of the routing table.

This article describes a new routing technology, called Virtual Aggregation (VA), which mitigates these problems. It makes extending the lifetime of old routers much easier, and makes it possible for existing routers to absorb a surge in the routing table size. Virtual Aggregation is a working item in the Global Routing Operations Working Group (GROW) working group of the IETF [7], and is documented in draft-ietf-grow-va [6] and related drafts.

Tricks for Keeping Old Routers Deployed

ISPs frequently want to extend the usefulness of a router beyond its "FIB death," and there are many tricks for doing just this. The most common is to structure the ISP in a core-edge arrangement. In this setup, a core of routers forms the backbone of the network. Edge routers connect to other networks and feed into the core. In many cases these edge routers do not need to know how to route to everything in the Internet. Rather, they often need to know only what addresses are reachable in their directly connected networks.

For instance, Figure 1 shows an ISP whose edge routers connect to three types of other networks: customer networks, peer ISP networks, and transit ISP networks. Each customer network has only one or a small number of address prefixes. The edge routers connecting customer networks must know what addresses are reachable in the customer networks, but everything else can be "default routed" to the core. Likewise, the routers connected to peer ISPs need to know how to route to the peers' customer addresses. Everything else can be defaulted to the core. The core routers and the edge routers that connect to transit ISPs, however, need to know how to route to everything.

Figure 1: With a core-edge style of deployment, some routers need to keep full routing tables, while others can keep partial routing tables and default route everything else to the ISP core.

A common practice is for ISPs to delegate FIB-dead routers to the customer or peer edges, and to have the core routers filter the routing information given to the edge routers. For instance, router A in Figure 1 learns the addresses reachable in customer network A (say, 20.1.1.0/24) and conveys them to the ISP core, but the core tells router A only that "everything else" is reachable through it (0.0.0.0/0). But what if customer A itself wants the full DFZ routing table? For instance, customer A might be multihomed to some other ISP, and might want to know which Internet destinations are best reachable through each ISP. To do this, it needs to receive the whole routing table from each ISP, a situation that, of course, cannot happen if the core withholds routes from router A.

As another example, what if two peer ISPs later decide that they want to offer transit service to each other? Now additional routes need to be conveyed to the peer-connected edge routers (router B), and this process may not be possible with limited FIB.

Another way an ISP can shrink its routing table is to default route to its transit ISPs. For instance, routers keep track only of how to route to customers and peers, and everything else is defaulted to the transit ISPs. When this default routing is done, even an ISP's core routers do not need the full routing table. A simple approach is for an ISP to send all defaulted packets to the nearest transit ISP. This process, however, may result in many packets taking a longer Internet path than necessary. Reference [1] describes a more complicated approach where the ISP maintains "semidefaults" for different transit networks in order to improve its global routing while reducing routing table size by about half. This approach, however, can be hard to manage.

In addition, any form of ISP-level default (simple or complex) results in sending extra traffic to the transit ISPs. A substantial amount of Internet traffic is targeted to nonroutable prefixes. When an ISP has the full routing table, it can identify this traffic and drop it before sending it to its transit ISPs. When an ISP defaults, it sends this traffic to its transit ISPs, and pays for it.

To summarize, dealing with FIB-dead routers leads to more complex management, limitations in business arrangements with peers and customers, poor paths over the Internet, and increased traffic load.

The Idea of Virtual Aggregation

In its simplest form, Virtual Aggregation allows an ISP to use FIB-dead routers as edge routers, in any edge router position (neighbor is a transit provider, a peer, or a customer) without limiting what routing information is exchanged. Configuration requirements are minimal. In a more complex form, Virtual Aggregation allows all ISP routers (not just edge routers) to be FIB-dead routers, without requiring ISP-level default routing.

Virtual Aggregation uses two basic mechanisms, FIB suppression and tunneling. Before discussing FIB suppression, a small amount of background is needed. Internet routers have a "data plane" and a "control plane." The data plane is what forwards packets, and includes such functions as header parsing, FIB lookup, queuing, and packet transmission. The control plane operates the background protocols that gather much of the information needed by the data plane. Examples include routing protocols such as the Border Gateway Protocol (BGP) and Open Shortest Path First (OSPF), and tunnel establishment protocols such as the Label Distribution Protocol (LDP).

The idea of FIB suppression is that the control plane operates as normal, but that certain routing table entries are not loaded into the FIB. This idea exploits the fact that it is (data plane) FIB memory, not control plane routing table memory that is the more severe bottleneck. By allowing the control plane to operate as normal, no changes are required to routing protocols or, for the most part, the management of routing protocols.

Tunneling is used to pass packets through routers that have suppressed FIB entries. The principle is illustrated in Figure 2. Here router A tells router B that it can reach 20.1.1.0/24. Router B in turn tells router C that router A can reach 20.1.1.0/24. As a result, router C tunnels packets destined for 20.1.1.0/24 to router A through router B. In other words, it wraps the IP header in another IP or a Multiprotocol Label Switching (MPLS) header that first gets the packet to router A. Router A strips that header, and sends the packet toward the destination. Notice that router B can suppress the route to 20.1.1.0/24 from the FIB—it only needs to know how to route the packet to router A. In other words, even though router B fully participates in the control plane, it is able to shrink its FIB through FIB suppression and tunneling.

Figure 2: Because router C tunnels the packet to router A, router B does not need to know how to forward packets with addresses in 20.1.1.0/24.

Virtual Aggregation in Practice, Simple Version

In the simplest version of Virtual Aggregation, a core-edge configuration is used. The core routers maintain full FIB tables. The edge routers FIB-install at least a default route to the core, and potentially additional routes if there is space in the FIB. This process is illustrated in Figure 3. Here there are two core routers, C1 and C2, and four edge routers, E1, E2, E3, and E4. The edge routers have external neighbors, N1, N2, and N3, as shown.

Figure 3: Packets can be delivered to 20.1.1.0/24 even if none of the edge routers has a FIB entry for 20.1.1.0/24.

The operation is best explained by example. Suppose that N2 advertises a route to destination 20.1.1.0/24 to E3 using External BGP (eBGP) and giving itself as the next hop. E3 in turn advertises this route to the other internal routers using Internal BGP (iBGP), with the next hop still as N2. The core routers install this route in their FIBs, with an indication that packets matching the route should be tunneled to the next hop, N2. Assume for now that all edge routers FIB-suppress the entry. When a packet for say 20.1.1.1 arrives at E1 from N1, E1 does not find an entry for 20.1.1.0/24, but does find the default route 0/0 telling it to forward the packet to its core router C1. C1 looks into its FIB and indeed finds an entry for 20.1.1.0/24 telling it to tunnel the packet to N2. C1 wraps the packet in another header, typically IP or MPLS, addressed to N2. When the packet reaches E3, however, E3 notes that the header directs it to send the packet to N2, strips off the outer header, and sends the packet to N2. E3 can do this without a FIB entry for 20.1.1.0/24.

MPLS already has all the mechanisms needed to perform this packet forwarding. E3 can use LDP to signal a Label Switched Path (LSP) to N2, and Penultimate Hop Popping can be used to strip off the MPLS header before forwarding the packet to the external neighbor N2 (as described in section 4.1.4 of [4]).

Alternatively, stacked MPLS label technology can be used; for example, the inner label is signaled with BGP (see "Carrying Label Information in BGP-4" [3]) while the outer label is signaled with LDP. Here E3 sets itself as the next hop for all the routes learned from external neighbors (for example, 20.1.1.0/24) when advertising them to its iBGP peers, and uses the inner label to identify the external neighbor (see section 4.3, "Label Stacks and Implicit Peering" of [4]). IP-in-IP tunneling can also be used, in this case signaled with softwires BGP attributes [5].

Now let's see what happens if a packet to 20.1.1.1 is received by E3 from external neighbor N3. If E3 has not FIB-installed the route for 20.1.1.0/24, it uses its default entry and forwards the packet to C2. C2 finds its entry for 20.1.1.0/24, which instructs it to tunnel the packet to N2. The packet is sent back to E3, which strips off the outer header and delivers the packet to E2. In this case, the packet has traveled an extra hop and back, a process that is not acceptable if done too much. As long as there is space in the FIB, however, routers are free to FIB-install additional routes. A good policy is to always install routes when external neighbors are the next hop. This policy avoids the longer path. In some cases, such as edge routers that connect to transit networks, there may not be enough FIB space to hold all routes from all external neighbors. In this case, the router may FIB-install the routes for which the most traffic is forwarded. Studies have shown that a small number of routes account for majority of the traffic, making Virtual Aggregation a very efficient solution [2].

Note that this simple form of Virtual Aggregation is very easy to configure. Essentially all that is needed is to tell the routers that they are using simple Virtual Aggregation, and to tell them if they are a core or an edge router. The routers can automatically configure everything else. Virtual Aggregation requires configuration of tunnels from every router to every other router, but these configurations also can be automatic. In any event, increasingly these tunnels are created anyway for the purpose of traffic engineering.

Simple Virtual Aggregation solves most of the problems described earlier. It can save FIB on any edge router without having to compromise BGP service to customers or flexibility in using peer networks for some transit. It also allows FIB-dead routers to be used as edge routers with transit ISPs. Finally, it prevents the need for ISP-level default routing to transits, thus avoiding unnecessarily sending unroutable traffic to the transit. And it does all this with much less configuration than is required to operate with FIB-dead routers today.

Virtual Aggregation in Practice, Complex Version

The simple version of Virtual Aggregation is satisfactory for edge routers, but it does nothing to reduce FIB size on core routers. What if an ISP wishes to also extend the lifetime of its core routers? Or wants to move away from a core-edge model, and rather connect all edge routers directly through a Layer 2 substrate like MPLS?

What if indeed there is a surge in routing table growth, thus causing ISPs all over the world to suddenly find themselves FIB-starved? There is a version of Virtual Aggregation that allows for FIB reduction in any and all routers in an ISP network.

The basic idea is to divide the address space so that different routers maintain full routes within different parts of the address space. So for instance, rather than have all core routers responsible for all of the address space, you could have half of the core routers responsible for the lower half of the address space, and the other half of the core routers responsible for the upper half of the address space. Figure 4 shows how this setup would look for the simple topology of Figure 3, keeping in mind that this example is rather simplistic.

Figure 4: In a complex version of Virtual Aggregation, even core routers do not need to hold the full routing table.

Assume that C2 FIB-installs only the lower half of the address space (0.0.0.0/1) and C1 FIB-installs the upper half (128.0.0.0/1). With this arrangement, the edge routers have two defaults instead of one. Packets to addresses in 0.0.0.0/1 are defaulted, through a tunnel, to C2, and packets to addresses in 128.0.0.0/1 are defaulted to C1. These defaults are learned simply by having C1 and C2 advertize their respective default routes with themselves as the next hop in iBGP.

As with the previous example, assume that router N2 advertises a route to 20.1.1.0/24, with itself as the next hop, to E3. E3 advertises this route to all other routers using iBGP. Only C2, however, FIB-installs this route—C1 suppresses it. When a packet to 20.1.1.1 arrives at E1, it looks in its FIB, finds a matching route to 0.0.0.0/1, and so tunnels the packet to C2. C2 terminates the tunnel, finds its FIB entry for 24.1.1.0/24, and tunnels the packet toward N2. E3 uses the tunnel information to know to forward the packet to N2, strips away the tunnel header, and forwards the packet to N2.

Now suppose that a packet for 20.1.1.1 arrives at E3 from N3. Ideally E3 has already automatically FIB-installed the route for 24.1.1.0/24 either because its external neighbor provides the next hop, or because the route is a high-volume destination. In this case, of course, the packet is directly forwarded to N2. However, if E3 has not FIB-installed the route, then its best match is the default to 0.0.0.0/1, and it tunnels the packet to C2. C2 in turns tunnels the packet back toward N2 through E3 as described before. Worse, if C1 rather than C2 FIB-installed the lower half of the address space, the packet would have detoured all the way to C1. Clearly these routes are not optimal, and so we must ask how nonoptimal would the complex version of Virtual Aggregation be in real ISPs.

The USENIX NSDI paper [2] answers this question for one large transit ISP. In this study, both the topology and the traffic matrix of the ISP are considered. The deployment strategy is substantially more complex than the simplistic example given previously. An upper limit is placed on the maximum increase in latency (5 ms) for any path through the ISP. There is a requirement that within a Point of Presence (PoP) at least two routers must cover the same address space. The number and size of address partitions are engineered to spread FIB load evenly. The "additional" routes installed in the FIB are designed to cover high-traffic destinations to the extent possible.

With these requirements in mind, this study found that FIB size could be reduced in all routers by at least an order of magnitude with a negligible increase (1–2%) in overall traffic load due to the occasional extra hops from the detours. This result ultimately translated into an increased router lifetime of easily 10 years.

The management requirements for the complex version are substantially greater than those for the simple version. The address partitions must be chosen, the routers assigned to address partitions must be chosen, and possibly some strategy for deciding what "additional" routes should be FIB-installed is needed. Whether this added configuration and the associated difficulties due to, for instance, misconfiguration are worth the cost savings for extending router lifetime is up to each ISP. Virtual Aggregation at least provides an option that was not previously available.

Status

Virtual Aggregation is a working-group item in the Global Routing Operations Working Group (GROW) in the IETF. The primary draft is draft-ietf-grow-va [6]. This draft has gone through several revisions, and is very close to its final form. Huawei is currently implementing Virtual Aggregation. A second open-source implementation has been built by Paul Francis' research group for the Quagga open-source routing platform, and is still being enhanced.

Acknowledgements

The authors would like to thank the co-authors of the Virtual Aggregation drafts, Hitesh Ballani, Dan Jen, Robert Raszuk, and Lixia Zhang. In particular, it was Robert who suggested the simple version of Virtual Aggregation.

References

[1] Andre Chapuis, "BGP Filtering," Presentation from SWINOG7, www.swinog.ch/meetings/swinog7/BGP_filtering-swinog.ppt

[2] Hitesh Ballani, Paul Francis, Tuan Cao, and Jia Wang, "Making Routers Last Longer with ViAggre," USENIX NSDI 2009, April 2009.

[3] Y. Rekhter and E. Rosen, "Carrying Label Information in BGP-4," RFC 3107, May 2001.

[4] E. Rosen, A. Viswanathan, and R. Callon, "Multiprotocol Label Switching Architecture," RFC 3031, January 2001.

[5] P. Mohapatra and E. Rosen, "BGP Encapsulation SAFI and BGP Tunnel Encapsulation Attribute," RFC 5512, April 2009.

[6] P. Francis, X. Xu, H. Ballani, D. Jen, R. Raszuk, and L. Zhang, "FIB Suppression with Virtual Aggregation," October 2009, draft-ietf-grow-va-01.

[7] IETF Global Routing Operations Working Group (GROW), http://www.ietf.org/dyn/wg/charter/grow-charter.html

PAUL FRANCIS is a faculty member at the Max Planck Institute for Software Systems in Germany. He has been active periodically in the IETF for nearly 20 years. Dr. Francis has held research positions at Cornell University, ACIRI, NTT Labs, and Bellcore, and was Chief Scientist at Fast Forward Networks and Tahoe Networks. E-mail: francis@mpi-sws.org

XIAOHU XU graduated from Beijing University of Posts and Telecoms in 2000. He has been working in the telecom industry for about 10 years and now is a research engineer with IP Advanced Technology Research Department of Huawei Technologies. Before joining Huawei at the end of 2004, he was the chief engineer of the Technical Support Department for Harbour Networks. E-mail: xuxh@huawei.com