Caveats in Testing Routing Protocol Convergence - The Internet Protocol Journal - Volume 8, Number 4

by Russ White, Cisco Systems

In general, the main problems we find when testing routing protocols lie in generating accurate (or rather, realistic) data, as well as understanding the limitations of tests geared towards measuring routing protocol performance. Three areas of specific interest are covered in this article: defining convergence, taking realistic measurements, and creating realistic data.

Defining Convergence

The first problem we face when trying to test routing is to define convergence. It seems like a simple question, but it’s not, because there are so many different ways to measure convergence:

  • How long does it take to begin forwarding traffic once a topology change has occurred?
  • How long does it take for every router in the network to adjust to a topology change that has occurred?
  • How long does it take for the forwarding information on a specific router to be updated once a topology change has occurred?
  • How long does it take for the routing protocol to adjust to a topology change?

Each of these questions is actually completely different, as a short examination of the network in Figure 1, below, shows.

Figure 1: Test Network

Assume A is the traffic source for a test, and H is the sink, or the convergence measurement point. To measure the convergence time of this network, you send a stream of traffic from A to H; when the traffic stabilizes, the C to G link is taken down, and the length of the gap in traffic at H is measured. In this environment, we assume the path fails off of the C to G link, and onto the path through E.

This test assumes the traffic between B and H, or between A and B, will not be impacted by the link between C and G failing, but we do not know this will always be the case. In fact, it’s possible that D and F will end up forming a microloop until they receive all the information needed to converge without the C to G link.

This microloop could last longer than C requires to recompute a path to H, so while the traffic from A to H may be successfully delivered, the network may not be in a fully converged state. The topic of microloop formation and avoidance is beyond the scope of this article.

In this small network, the time it takes for A to continue forwarding traffic to H may not be the same as the time it takes for the entire network to stabilize after the topology change. How long it takes for A to be able to reach H, and how long it takes for all the routers in the network to adjust to the topology change are two different questions. In this case, the concept of convergence is unclear, with several possible meanings; to properly build and understand the results of the test, we need to better understand the question being asked.

You could alter the test so only A, C, E, G, and H are in the network. This would provide a â€Å“cleanâ€ï¿½ test of just the failover capabilities of the routing protocol being tested, as it’s implemented on the specific routers in the network, across the specific link types connecting the routers, in the simple failover situation. While the limited topology does limit the number of outputs being measured in the test, it also limits the closeness of the tested network to a real network design. The test can provide some very specific data points, but, once the test topology is simplified, it cannot provide a true picture of convergence in a larger, more complex topology.

Another option is to refine the test procedure so the traffic between B and H is tested as well as the traffic between A and H. Measuring traffic flow from every possible connected end point to every other possible connected end point on the network provides a number called goodput, which is the relation between the traffic injected into the network versus the traffic the network delivers across all paths.

Although this type of testing does provide more data in a more complex topology, it also has its drawbacks. For instance, if you are trying to compare two different implementations of a single protocol, or compare two different routing protocols, this test not only counts the amount of time required for the routing protocol to converge, it also tests the amount of time required to note the topology change, the time required to install the newly computed routes into the local routing table, and the time required to pass the changes from the routing table to the local forwarding tables. This might—or might not—be a good thing.

Isolating just the routing protocol can provide information about the performance of a specific implementation of the protocol in specific network designs, and under certain conditions. Including platform and media-specific issues—such as the installation of information into a local table—may cloud the picture. For instance, if the routing protocol can converge in milliseconds, but it takes seconds to determine that the link between C and G has failed, any changes in routing protocol convergence time will be lost in the much larger link failure detection time, reducing the value of the test.

In short, numerous tradeoffs are involved in designing a test to measure routing protocol convergence; you need to begin with the right questions, and understand the tradeoffs in the various tests you could, or might, run. There’s no â€Å“simpleâ€ï¿½ way to run a single test that will give you all the information you need to know to understand all possible implementations of a routing protocol on all possible platforms.

In the same way, it’s important to keep these types of limiting factors in mind when reading, or using, test results provided by outside companies. It’s fairly easy to look at a specific test for one measure, such as the number of neighbors a specific implementation of the Border Gateway Protocol (BGP) can support in specific conditions, and attempt to generalize those test results to much larger and varied real world networks. Quite often, the mapping isn’t all that simple.

Taking Realistic Measurements

Assume you determine you want to test for protocol convergence by checking the routing tables at each router in the network in Figure 1, rather than trying to measure convergence by measuring traffic flow through the network. How would you go about doing this? There are two general types, or classes, of tests, that you could consider:

  • Black Box: Treat the device as a black box, only using outside signals and controls, and never any output provided from the device itself.
  • White Box: Use available output provided from the device itself, possibly with tests using signals outside the device, to determine when specific events on the device occur.

Obviously, black box testing is much more difficult, maybe impossible in some conditions, but, at the same time, can provide more â€Å“objectiveâ€ï¿½ measures of a devices’ performance. Examples of black box tests for the Open Shortest Path First (OSPF) protocol are outlined in RFC 4061, RFC 4062, and RFC 4063. White box testing typically depends on debug and show commands to provide timestamped information about when specific events occur, such as when the routing protocol has received information about the topology change, when the routing protocol has finished computing the best path to each destination, and other events.

For simplicity, the network is reconfigured with a test measurement device, as shown in Figure 2, below.

Figure 2: Reconfigured Test Network

Some mechanism is used to determine when the routing protocol on each router has computed the correct routes; the network is connected, and allowed to converge. The link between C and G is taken down, and the time between the link failure and the correct routes being computed on C, D, E, F, and G is taken as the total convergence time in the network. This appears to be a straight forward test; what sorts of problems can we run in to here?

There are two possible mechanisms for determining when each device has correctly computed the routes after the C to G link fails:

  • Some sort of â€Å“continuous output,â€ï¿½ such as a debug, can be configured on each router, and the results collected and analyzed.
  • The Tester can poll each device, using show commands, or some black box testing technique, to determine when device has recalculated the routes correctly.

Let’s examine each of these techniques separately.

Gathering Results from Continuous Router Output

The first, and simplest, mechanism is to gather the results from each router through debugging information provided by the protocol implementation which is generally used for troubleshooting and monitoring the routing protocol. There are three primary issues related to using this information you need to be aware of:

  • The continuous stream of information provided by the device being tested can actually impact the test results, primarily because of the processor cycles required to record and display this information. In some situations, the additional cost is negligible, and in others, it’s simply not important (for instance, if the test is designed to show the differential between two situations, rather than provide absolute convergence times).
  • If the timestamps injected by the devices being tested in the network are relied on, then the time clocks of every device must be synchronized. This synchronization must generally be within about 1/10th or less of the total variation in the test time for the results to be meaningful. In other words, if the timeclocks on all the devices are synchronized within one second of each other, and the results of the test are expressed in milliseconds, the actual test results are going to be lost in variations in the synchronization of the timeclocks.
  • If the devices feed their information to the Tester, and the timestamp on the Tester is used to compare the event times within the network, the timestamps can be skewed by the packet processing requirements of the devices, as well as queuing delays in the Tester. Most routers prioritize routing traffic over switched traffic, and switched traffic over management traffic. There could be significant lags between an event occurring, and the router actually building a packet noting the occurrence of that event. Again, this is a matter of time differentials; if the test results are expressed in milliseconds, queuing delays alone can bury the results in noise.

We need to be careful when using debug or other continuous output to measure network convergence times in any given test, then. Quite often, we need to compare the granularity of the test results with the measurement technique used, and consider how much noise the measurement technique is actually likely to inject into the testing environment, compared to the test results granularity.

Polling Devices

Another common technique is to run some sort of process on the Tester which polls each device, either using some black box or white box measurement, to determine when each device finishes recalculating routes after the topology change has occurred. This type of test is also constrained by various factors that might not be obvious when you are designing a test, or examining the results of a test that uses it. Assume events in the network occur as Figure 3 illustrates.

Figure 3: Poll Testing Scenario

In Figure 3A, we assume that the Tester is able to poll every device in the network at the same time, once a second. The test shows the network converged at 4 seconds after the event, although the last router to converge, G, does so just after the 3 second mark. There can be a variation of the entire polling interval in the actual results without the test showing any difference in the convergence time of the network, implying that the polling interval must be much faster than the expected (measured) test results for the results to be meaningful. We normally suggest that the polling interval be about 10 times faster than the expected measurement rate, or that the Tester should poll every 1/10th of a second in this test, if the results are to be measured in seconds.

However, in real test environments, a test device cannot actually poll every device in the network at the same time. Instead, the Tester will poll one device periodically, rotating through the polled devices, so the longest time between any specific device being polled is the polling rate. We can call this rotating polling serialization, and the time it takes to rotate through all the devices the serialization delay. Here, we’ve spread out the polls across the total one second polling time, to illustrate, in Figure 3B. Three anomalies show up in this illustration:

  • The total time for the network to converge is still just over three seconds, while the recorded test time is still in the four second range. This is similar to the problem we noted when we assumed the Tester was polling all the devices in the network at the same time.
  • It appears, from our test results, that E and F have converged at about the same moment. In reality, their convergence is separated by almost one second. In some extreme cases, the devices may actually converge in the opposite order from the order they appear to converge.
  • If the convergence order of D and G were to be reversed, the network would appear to converge almost a half a second faster, although the actual convergence time would remain constant. This could cause a widely diverging set of test results over multiple runs in what is, actually, a fairly consistent network convergence time.

Adding the serialization delay of polling isn’t enough, however, to understand polling in real test environments. We also need to remember that each device which is polled must also answer each one of the polls, thereby introducing another variable amount of delay into the test results. For instance, in Figure 3C, C is polled once before and once after it converges. If we take the time that C answers as its convergence time, then we are also including processing time on C, which is variable, into C’s total convergence time. However, if we take the polling time as C’s convergence time, it’s possible that the poll was received before C converged, and was processed, and answered, after C converged, skewing the results in the opposite direction.

Unfortunately, there are no simple answers to these problems. Instead, when you are designing a test, or examining the results of a test, the mechanism used to determine convergence, the rate at which that mechanism is used, and the reported final results, should be taken together, and considered closely. A test which reports results in milliseconds, but polls a large number of devices from a single test device, should be examined closely for serialization delay errors.

Use Real-Life Configuration Parameters and Prefix Attributes

Finally, we need to consider what is probably one of the most widely disregarded concerns in testing routing protocol implementations: building accurate and repeatable data sets to feed into the test. Let’s examine a common test, to help in understanding this problem.

A network engineer sets up a router connected to a router testing device using a SONET link. The router tester is then configured to feed one million routes, through BGP, to the router being tested. The test is run, and the amount of time it takes for the router to accept and install all of the routes into its local tables is measured. The router is disconnected (we’ll call this first router A), and another router (B) is connected. The same test is performed. In the end, the network engineer proclaims A has a better BGP implementation than B, because A accepted and installed the routes fed to it faster than B.

This sort of test, and these results, should raise a lot of red flags for anyone who’s ever tested routers before. Many questions here are not answered:

  • Were both routers tuned to optimum parameters for this specific test? Most routers are installed in a number of different situations in various networks, and most will perform better if they are tuned to fit the role they are playing in the network. This is similar to tuning a server for database use, or web server use.
  • BGP is very sensitive to the data transmitted from one router to another; BGP implementers are generally aware of this, and use differing models of BGP behavior in different networks to tune their implementations. Specifically, in the case of BGP:
    • What percentage of the prefixes transmitted were of specific prefix lengths? What percentage of the routes transmitted were /24s, /23s, and so on?
    • How many different attribute sets were represented in the routing information transmitted? What number of unique attribute sets were included in the routes? For each attribute set, what percentage of the table did that attribute set represent?

Each of these questions can, and should, be compared to real world measures in the network the router is going to be installed in. There are some instances where protocol implementers have tuned their implementation for use in an Internet Point of Presence (POP), for instance, and the implementation doesn’t fare as well as a route reflector, or the other way around. For some vendors, this tuning could even be on a platform by platform basis, making the job of characterizing a specific implementation through a simple test, like that described above, very difficult.


Designing, executing, and evaluating the results of a test attempting to measure network convergence is much more complex than it appears on the surface. In any given test situation, we need to ask:

  • What was the test designed to measure? Is it measuring the appropriate outputs, in the correct ways, to actually measure this?
  • What is the granularity of the test results and the actual network events, compared with the measurement techniques used in the test? Will normal test results get lost in the noise introduce by the measurement techniques?
  • What is the data set used to build the test? Does it accurately reflect the data the routing protocol implementation will be handling in a real network (or more specifically, the real network the router will be installed in).

When designing, or evaluating, test results, there’s a strong tendency to be dogmatic about the results, to say some specific test proves, in some way, a specific vendor, platform, protocol, or implementation, is â€Å“better.â€ï¿½ When evaluating tests in the real world, however, we need to be cautious of such statements, and try to examine the entire environment, considering test results with skepticism, and try to understand their limits—as well as their results.

For Further Reading

[1] V. Manral, R.White, A. Shaikh, â€Å“Benchmarking Basic OSPF Single Router Control Plane Convergence,â€ï¿½ RFC 4061, April 2005.

[2] V. Manral, R. White, A. Shaikh, â€Å“OSPF Benchmarking Terminology and Concepts,â€ï¿½ RFC 4062, April 2005.

[3] V. Manral, R. White, A. Shaikh, â€Å“Considerations When Using Basic OSPF Convergence Benchmarks,â€ï¿½ RFC 4063, April 2005.

RUSS WHITE works for Cisco Systems in the Routing Protocols Deployment and Architecture (DNA) team in Research Triangle Park, North Carolina. He has worked in the Cisco Technical Assistance Center (TAC) and Escalation Team in the past, has coauthored several books on routing protocols, including Advanced IP Network Design, IS–IS for IP Networks, and Inside Cisco IOS Software Architecture. He is currently in the process of publishing a book on BGP deployment, and is the co-chair of the Routing Protocols Security Working Group within the IETF. E-mail: