The Internet Protocol Journal, Volume 14, No. 2

IPv6 Site Multihoming

Fred Baker, Cisco Systems

In today's Internet, site multihoming—an edge network configuration that has more than one service provider but does not provide transit communication between them—is relatively common. Per the statistics at www.potaroo.net, almost 40,000 Autonomous Systems are in the network, of which about 5,000 seem to offer transit services to one or more customers. The rest are in terminal positions, possibly meaning three things. They could be access networks, broadband providers offering Internet access to small companies and residential customers; they could be multihomed edge networks; or they might be networks that intend to multihome at some point in the future. The vast majority, on the order of 75 percent, are multihomed or intend to multihome. That is but one measure; you do not have to use Border Gateway Protocol (BGP) routing to have multiple upstream networks. Current estimates suggest that there is one multihomed entity per 50,000 people worldwide, and one per 18,000 in the United States.

We also expect site multihoming to become more common. A current proposal in Japan suggests that each home might be multihomed; it would have one upstream connection for Internet TV, and one or more other connections provided by Internet Service Providers (ISPs), operating over a common Digital Subscriber Line (DSL) or fiber-optic infrastructure. That scenario has one multihomed entity for every four people.

Why do edge networks multihome? Reasons vary. In the Japanese case just propounded, it is a fact of life—users have no other option. In many cases, it is a result of a work arrangement, or a strategy for achieving network reliability through redundancy.

For present purposes, this article considers scaling targets derived from a world of 10 billion people (circa 2050), and a ratio of one multihomed entity per thousand people—on the order of 10,000,000 multihomed entities at the edge of the Internet. Those estimates may not be accurate 40 years from now, but given current trends they seem like reasonable guesses.

RFC 1726 [1], the technical criteria considered in the selection of what at the time was called IP Next Generation (IPng), did not mention multihoming per se. Even so, among the requirements are scalable and flexible routing, of which multihoming is a special case. When IPv6 was selected as the "next generation," multihoming was one of the topics discussed. The Internet community has complained that this particular goal was not fulfilled. Several proposals have been proffered; unfortunately, each has benefits, and each has concerns. No single perfect solution is universally accepted.

In this article, I would like to look at the alternatives proposed and consider the effects they have. In this context, the goals set forth in RFC 3582 [2] are important; many people tried to state what they would like from a multihoming architecture, and the result was a set of goals that solutions only asymptotically approach.

The proposals considered in this article include:

  • Provider Independent Addressing, also known as BGP Multihoming
  • Exchange-Based Addressing
  • Shim6, also known as Level 3 Multihoming
  • Identifier-Locator Network Protocol (ILNP)
  • Network Prefix Translation, also known as NAT66

BGP Multihoming

BGP Multihoming involves a mechanism relatively common in the IPv4 Internet; the edge network either becomes a member of a Regional Internet Registry (RIR) [APNIC, RIPE, LACNIC, AFRINIC, ARIN] and from that source obtains a Provider-Independent (PI) prefix, or obtains a Provider-Allocated (PA) prefix from one provider and negotiates contracts with others using the same prefix. In any case, it advertises the prefix in BGP, meaning that all ISPs—including in the PA case—the provider that allocated it, must carry it as a separate route in their routing tables.

The benefit to the edge is easily explained, and in the case of large organizations it is substantial. Consider the case of Cisco Systems, whose internal network rivals medium-sized ISPs for size and complexity. With about 30 Points of Attachment (PoAs) to the global Internet, and at least as many service providers, Cisco has an IPv6 /32 PI prefix, and hundreds of offices to interconnect using it. One possible way to enumerate the Cisco network would be to use the next five bits of its address (32 /37 prefixes) at its PoAs, and allocate prefixes to its offices by the rule that if their default route is to a given PoA, their addresses are derived from that PoA. By advertising the PoAs /37 and a backup /32 into the Internet core at each PoA, Cisco could obtain effective global routing. It would also obtain relative simplicity for its internal network—only one subnet is needed on any given Local-Area Network (LAN) regardless of provider count or addressing, and routing can be optimized independently from the outside world.

The problem that arises with PI addressing, if taken to its logical extreme, is that the size of the routing table explodes. If every edge network obtains a PI prefix—neglecting for the moment both BGP traffic engineering and the kind of de-aggregation suggested in Cisco's case—the logical outcome of enumerating the edge is a routing table with on the order of 107 routes. The memory required to store the routing table, and in the Secure Interdomain Routing (SIDR) case the certificates that secure it, is one of the factors in the cost of equipment. The volume of information also affects the time it takes to advertise a full routing table, and in the end the amount of power that a router uses, the heat it produces, and a switching center's air conditioning requirements. Thus both the capital cost of equipment used in transit networks and the cost of operations would be affected. In effect, the Internet becomes the "poster child" for the Tragedy of the Commons.

Exchange-Based Addressing

Steve Deering proposed the concept of exchange-based addressing at the IETF meeting in Stockholm in 1995, under the name Metropolitan Addressing. In this model, prefixes do not map to companies, but to Internet exchange consortia, likely regional. One organizing principle might be to associate an Internet exchange with each commercial airport worldwide, about 4000 total, resulting in a global routing table on the same order of magnitude in size. Edge networks, including residential networks, within that domain obtain their prefix from the exchange, and they are used by any or all ISPs in the region. Routes advertized to other regions, even within the same ISP, are aggregated to the consortium prefix.

The benefits to the edge network in exchange-based addressing are similar to the benefits of PI addressing for a large corporation. In effect, the edge networks served by an exchange consortium behave like the "departments" of a "user consortium," and they enjoy great independence from their upstream providers. They can multihome or move between providers without changing their addressing, and on a global scale the routing table is contained to a small multiple of the number of such consortia.

However, the benefit to users is in most cases a detriment to their ISPs; the ISPs are forced to maintain routes to each user network served by the consortium—or at least routes for their own customers and a default route to the exchange. Thus, the complexity of routing is moved from the transit core to the access networks serving regional consortia. In addition, if there is no impediment to a user flitting among ISPs, users can be expected to flit, imposing business costs.

The biggest short-term effect on the ISP might well be the reengineering of its transit contracts. In today's Internet, a datagram sent by users to their ISPs is quickly shuttled to the destination’s ISPs, which then carry it over the long haul. In an exchange-based network, there is no way to remotely determine which local ISP or ISP instance is serving a given customer.

Hence, the sender's ISP carries the datagram until it reaches the remote consortium, whence it switches to the access network serving the destination. One could argue that a "sender-pays" model might have benefits, but it is very different from the present model.

The edge network has problems, too. If the edge network is sufficiently distributed, it will have services in several exchange consortia, and therefore several prefixes. Although there is nothing inherently bad about that, it may not fit the way a cloud computing environment wants to move virtual hosts around, or miss other requirements.

Level 3 Multihoming: Shim6

The IETF's shim6 model [9] starts from the premise that edge networks obtain their prefixes from their upstream ISPs—PA Addressing. If a typical residential or small business does so, there is no question of advertising its individual route everywhere; the ISP can route internally as its needs to, but globally, the number of ISPs directs the size of the routing table. If that is, as potaroo suggests, on the order of 10,000, the size of the routing table will be on the same order of magnitude.

The benefit to the ISP should be obvious; it does not have to change its transit contracts, and although there will be other concerns, it does not have the routing table ballooning memory costs or route exchange latencies.

However, as exchange-based addressing moves operational complexity from the transit core to the access network, shim6 moves such complexities to the edge network itself and to the host in it. If a network has multiple upstream providers, each LAN in it will carry a subnet from each of those providers—not one subnet per LAN, but as many as the providers of the host's LAN will use. At this point, the ingress filtering of RFC 3704 [21] at the provider becomes a problem at the edge; the host must select a reasonable address for any session it opens, and must do so in the absence of specific knowledge of network routing. A wrong guess can have dramatic effects; a session routed to the wrong provider may not work at all, and an unfortunate address choice can change end-to-end latency from tens of milliseconds to hundreds or worse by virtue of backbone routing.

Application layer referrals and other application uses of addresses also have difficulties. Although the address a session is using will work both within and without the network, if a host has more than one address, one of the other addresses may be more appropriate to a given use. Hence, the application that really wants to use addresses is saddled with finding all of the addresses that its own host or a peer host might have.

There is also an opportunity. TCP today associates sessions with their source and destination addresses. The shim6 model, implemented in the Stream Control Transmission Protocol (SCTP) [17] and Multipath TCP (MPTCP) [16], allows a session to change its addresses, meaning that a session can survive a service provider outage. Doing the same in TCP requires the insertion of a shim protocol between IP and TCP; at the Internet layer, the address might change, but the shim tracks the addresses for TCP.

There are, of course, ways to solve the outstanding problems. For simple cases, RFC 3484 [3, 4] describes an address-selection algorithm that has some promise. In the Japanese case, a residential host might use link-local addresses within its own network, addresses appropriate to the television service on its TV and set-top box, and an ISP's prefix for everything else. If there is more than one router in the residential LAN serving more than one ISP, exit routing can be accomplished by having the host send data using an ISP's source address to the router from which it learned the prefix. When the network becomes more complex, though, we are looking at new routing protocols that can route based on a combination of the source and the destination addresses, and we are looking at network management methodologies that make address management simpler than it is today, adding and dropping subnets on LANs—and as a result renumbering networks—without difficulty. It also implies a change to the typical host implementing the shim protocol. Those technologies either do not exist or are not widely implemented today.

Identifier-Locator Network Protocol

The concept of separating a host's identity from its location has been intrinsic to numerous protocol suites, including the Xerox Network Systems (XNS), Internetwork Packet Exchange (IPX), and Connectionless Network Service (CLNS) models. In the IP community, it was first proposed in Saltzer's ruminations on naming and binding, RFC 1498 [5], and in Noel Chiappa’s NIMROD routing architecture, RFC 1992 [6]. In short, a host (or a set of applications running on a host, or a set of sessions it participates in) has an identifier independent of its network topology, and sessions can change network paths by simply changing the topological locations of their endpoints. Mike O'Dell, in Internet Drafts in 1996 and 1997 called 8+8 and GSE, suggested an implementation of this scenario using the prefix in the IPv6 address as a locator and the interface identifier as an identifier. One implication of the GSE model is the use of a network prefix translation between an edge network and its upstream provider whatever prefix the edge network uses internally, in the transit backbone, the locator appears to be a PA prefix allocated by the ISP in question. As a result, the routing table, as in shim6, enumerates the ISPs in the network—on the order of 10,000.

The Identifier-Locator Network Protocol (ILNP) takes the solution to fruition, operating on that basic model and adding a Domain Name System (DNS) Resource Record and a random number nonce to mitigate on-path attacks that result from the fact that the IPv6 Interface Identifier (IID) is not globally unique.

As compared to the operational complexities and costs of PI Addressing, Exchange-Based Addressing, and shim6, ILNP has the advantage of being operationally simple. Each LAN has one subnet, when adding or changing providers no edge network renumbering is required, and, as noted, the cost of the global routing table does not increase. Additionally, it is trivial to load-share traffic across points of attachment to multiple ISPs, because the locator is irrelevant above the network layer. And unlike IPv4/IPv4 Network Address Port Translation (NAPT), the translation is stateless; as a result, sessions using IP Security (IPsec) Encapsulation Security Protocol (ESP) encryption can cross it.

In this case, the complexities of the network are transferred to the application itself, and to its transport. The application must, in some sense, know all of its "outside" addresses. It can learn them, of course, by using its domain name in referrals and other uses of the address; in some cases however, the application really wants to know the address itself. If it is communicating those addresses to other applications—the usual usage—the assumption that its view of its address is meaningful to its remote peer is, in the words of RFC 3582 [2], Unilateral Self-Address Fixing (UNSAF), and the concerns raised in RFC 2993 [7] are the result. To mitigate those concerns, ILNP excludes the locator from the TCP and User Datagram Protocol (UDP) pseudo-headers (and as a result from the checksum).

The implication of ILNP is, as a result, that TCP and UDP must be either changed or exchanged for other protocols such as Stream Control Transmission Protocol (SCTP) or Multipath TCP (MPTCP), and that applications must either use DNS names when referring to themselves or other systems in their network—sharply dividing between the application and network layers—or devise a means by which they can determine the full set of their "outside" addresses.

Network Prefix Translation, Also Known as NAT66

Like ILNP, Network Prefix Translation (NPTv6) derives from and can be considered a descendant of the GSE model. It differs from ILNP in that it defines no DNS Resource Record, defines no end-to-end nonce, and requires no change to the host, especially its TCP/UDP stacks. To achieve that, the translator updates the TCP/UDP checksum in the source and destination addresses.

If the ISP prefix is a /48 prefix, this prefix allows for load sharing of sessions across translators leading to multiple ISPs; if the ISP prefix is longer, such as a /56 or /60, the checksum update must be done in the IID, and as a result load sharing can be accomplished only across translators between the same two networks. Like ILNP and unlike IPv4/IPv4 NAPT, the translation is stateless; as a result, sessions using IPsec ESP encryption can cross it.

The complexities of the network are again transferred to the application itself, but not to its transport. The application must, in some sense, know all of its "outside" addresses. Using its domain name in referrals and other uses of the address can determine these addresses; in some cases, however, the application really wants to know the address itself. If it is communicating those addresses to other applications—the usual usage—the assumption that its view of its address is meaningful to its remote peer is, again in the words of RFC 3582 [2], "UNSAF," and some of the concerns raised in RFC 2993 [7] result.

The implication of NPTv6 is that applications must either use DNS names when referring to themselves or other systems in their network—sharply dividing between the application and network layers—or devise a means by which they can determine the full set of their "outside" addresses. However, the IPv6 goal of enabling any system in the network to communicate with any other given administrative support is retained.

Ways Forward

From the perspective of this author, the choice of multihoming technology will in the end be an operational choice. The practice of multihoming is proliferating and will continue to do so. There is a place for provider-independent addressing; it may not in reality make sense for 40,000 companies, but it probably does for the largest edge networks. At the other extreme, shim6-style multihoming makes sense in residential networks with a single LAN; as described earlier, there are simple approaches to making that work through reasonable policy approaches.

For the vast majority of networks in between, policy suggestions that do not substantially benefit the network or users who implement them do not have a good track record. Hence, while Exchange-Based Addressing materially assists in edge network problems, there is no substantive reason to believe that the transit backbone will implement it. Similarly, although shim6 materially helps with the capital and operational expenses of operating the transit backbone, it is not likely that edge networks will implement it.

We also have a poor track record in changing host software. For example, SCTP is in many respects a superior transport protocol to TCP—it allows for multiple streams, it is divorced from network layer addressing, and it allows endpoints to change their addresses midsession.

In a 2009 "Train Wreck" workshop at Stanford University, in which various researchers argued all day in favor of the development of a new transport with requirements much like those of SCTP, the research community acted as if ignorant of it when the protocol was brought up in conversation.

NPTv6 is not a perfect solution, but this author suspects that it will be operationally simple enough to deploy and manage and close enough to the requirements of edge networks and applications that it will, in fact, address the topic of multihoming.

References

[1] Craig Partridge and Frank Kastenholz, "Technical Criteria for Choosing IP The Next Generation (IPng)," RFC 1726, December 1994.

[2] Joe Abley, Benjamin Black, and Vijay Gill, "Goals for IPv6 Site-Multihoming Architectures," RFC 3582, August 2003.

[3] Richard Draves, "Default Address Selection for Internet Protocol version 6 (IPv6)," RFC 3484, February 2003.

[4] Arifumi Matsumoto, Jun-ya Kato, and Tomohiro Fujisaki, "Update to RFC 3484 Default Address Selection for IPv6," Internet Draft, Work in Progress, March 2011, http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise

[5] Jerome Saltzer, "On the Naming and Binding of Network Destinations," RFC 1498, August 1993.

[6] Isidro Castineyra, Noel Chiappa, and Martha Steenstrup, "The Nimrod Routing Architecture," RFC 1992, August 1996.

[7] Tony Hain, "Architectural Implications of NAT," RFC 2993, November 2000.

[8] Leslie Daigle, Ed., IAB "IAB Considerations for UNilateral Self-Address Fixing (UNSAF) Across Network Address Translation," RFC 3424, November 2002.

[9] Erik Nordmark and Marcelo Bagnulo, "Shim6: Level 3 Multihoming Shim Protocol for IPv6," RFC 5533, June 2009.

[10] Ole Troan, David Miles, Satoru Matsushima, Tadahisa Okimoto, and Dan Wing, "IPv6 Multihoming without Network Address Translation," Internet Draft, Work in Progress, http://tools.ietf.org/html/draft-ietf-v6ops-ipv6-multihoming-without-ipv6nat

[11] Margaret Wasserman and Fred Baker, "IPv6-to-IPv6 Network Prefix Translation," Internet Draft, Work in Progress, http://tools.ietf.org/html/draft-mrw-nat66

[12] Ran Atkinson and Scott Rose, "DNS Resource Records for ILNP," Internet Draft, Work in Progress, http://tools.ietf.org/html/draft-rja-ilnp-dns

[13] Ran Atkinson, "ICMP Locator Update message," Internet Draft, Work in Progress, http://tools.ietf.org/html/draft-rja-ilnp-icmp

[14] Ran Atkinson, "ILNP Concept of Operations," Internet Draft, Work in Progress, http://tools.ietf.org/html/draft-rja-ilnp-intro

[15] Ran Atkinson, "ILNP Nonce Destination Option," Internet Draft, Work in Progress, http://tools.ietf.org/html/draft-rja-ilnp-nonce

[16] Alan Ford, Costin Raiciu, Mark Handley, and Olivier Bonaventure, "TCP Extensions for Multipath Operation with Multiple Addresses," Internet Draft, Work in Progress, http://tools.ietf.org/html/draft-ietf-mptcp-multiaddressed

[17]Randall Stewart, Ed., "Stream Control Transmission Protocol," RFC 4960, September 2007.

[18] Randall Stewart, Qiaobing Xie, Michael Tuexen, Shin Maruyama, and Masahiro Kozuka, "Stream Control Transmission Protocol (SCTP) Dynamic Address Reconfiguration," RFC 5061, September 2007.

[19] Jon Postel, "User Datagram Protocol," RFC 768, August 1980.

[20] Jon Postel, "Transmission Control Protocol," RFC 793, September 1981.

[21] Fred Baker and Pekka Savola, "Ingress Filtering for Multihomed Networks," RFC 3704 [BCP 84], March 2004.

[22] David Meyer, "The Locator Identifier Separation Protocol (LISP)," The Internet Protocol Journal, Volume 11, No, 1, March 2008.

FRED BAKER, a Cisco Fellow, has been active in technology development and Internet standardization since the 1980s. He participated in early development of IEEE 802.1d switching and IP routing. In the IETF, he has written or edited RFCs on a variety of topics, and chaired both working groups and the IETF itself. At this time, he is the IETF's Voting Member on the U.S. NIST Smart Grid Interoperability Panel, a member of the SGIP's Architecture Committee, and co-chair of the IETF IPv6 Operations Working Group. At Cisco, his group supports research at universities; he is looked to for research advice and mentorship both within and outside the company. E-mail: fred@cisco.com