by Dan Wing and Andrew Yourtchenko, Cisco Systems
To be successful, new technologies must improve the user experience. In the process of finding the best way to deploy a new technology, several approaches are typically conceived, written down, tried, and possibly discarded. This article addresses two such approaches for Internet Protocol Version 6 (IPv6) and the Stream Control Transmission Protocol (SCTP) .
Modern web browsers, web servers, and operating systems support IPv4 and IPv6, and several major content providers already support IPv6, including Google, NetFlix, and Facebook. However, their properties are not generally available over IPv6 because of a conflict between IPv6 technology and their business realities.
The technology in web browsers and operating systems involves doing Domain Name System (DNS) queries for AAAA and A resource records and then attempting to connect to the resulting IPv6 and IPv4 addresses sequentially. If the IPv6 path is broken (or slow), this connection can take a long time before it falls back to trying IPv4. This process is especially painful on typical websites that retrieve objects from different hosts—each failure incurs a delay. The combination of operating system and web browser results in delays from 20 seconds to several minutes if the IPv6 path is broken . The typical message flow of a TCP client is shown in Figure 1. Clearly, this delay is unacceptable to users. Users avoid this delay by disabling IPv6  or avoiding IPv6-enabled websites.
The problem of broken IPv6 networks is relatively widespread . Providing content is a business—either directly (for example, streaming movies) or indirectly (for example, selling advertising). If users suffer delays viewing IPv6-enabled content (because of the technology reasons described previously), they will have an incentive to visit other websites. This scenario means lost revenue and is unacceptable to the business. Considering that all of the customers on today's Internet can reach IPv4 content, it is a business risk to enable IPv6 because some customers will suffer delays attempting to view IPv6 websites. Major content providers have been monitoring the situation and have published results  showing that the IPv6 failure rate is too high to enable IPv6 AAAA for their content.
IPv6 problems have several causes. It is new technology, and monitoring of IPv6 connectivity is not yet on par with that of IPv4 because of single-point tunnels, unmanaged tunnels , accidentally misconfigured firewalls, and router and link failures can more easily cause outages on IPv6. Many applications remain IPv4-only, or network administrators are relying on dual-stack equipment to transparently fail over to IPv4 during IPv6 outages.
However, such failover is never transparent to users—it takes many seconds or minutes! To avoid these problems, the content provider has only one choice: don't provide AAAA records if users might experience broken or slow IPv6.
To work around that problem, Google implements a white list of DNS servers that it will provide AAAA records for . However, in its current incarnation, DNS white listing does not scale well because the Internet Service Provider (ISP) has to prove good IPv6 connectivity to Google, and then Google white lists the ISP's DNS servers to receive the AAAA records. The scaling problem is that there are thousands of ISPs around the world, and white listing and de-white listing them becomes a tiresome manual task for both ISPs and Google. Furthermore, if every content provider did DNS white listing, ISPs would have to work with several content providers in order to give value to the IPv6 network they have deployed to their subscribers! Content providers have started working together to consolidate requirements for DNS white listing and operate some sort of DNS white-listing service to slightly automate this process .
Yet, DNS white listing still does not guarantee a working IPv6 network or a fast IPv6 network, because there is not a direct relationship between good IPv6 connectivity and the DNS server of a user's ISP. Even with the best of intentions and network design, there will still be instances where an IPv6 path or IPv4 path is working when the other path is broken. The result will be excessive delays for IPv4-only clients or dual-stack clients, depending on what sort of breakage occurs.
This situation contributes to the user perception that the Internet, or the particular website being accessed, is "down." The user will visit a different site instead, possibly never returning to the site that was "down."
A different approach solves these problems. In this approach, rather than an application slowly trying to make a connection on IPv6 and then on IPv4, the application makes its connection attempts more aggressively over both IPv6 and IPv4. Initially, the connection attempts are made simultaneously (rather than serialized), in order to provide a fast user experience.
The simultaneous connection attempts consume a little extra network bandwidth and twice the connection attempts on the server. To reduce that chatter, a cache is also maintained to store the success or failure of connecting using IPv6 or IPv4. We nickname this approach "Happy Eyeballs" , because the "eyeballs" (users) are happier—their computer provides them immediate content, even if the network is suffering slow performance on IPv6 or IPv4 (Figure 2).
Obviously, sending a TCP SYN on both IPv6 and IPv4 doubles the number of connection attempts sent by the client. As discussed in , this chatter can be reduced by the application remembering if IPv6 (or IPv4) was successful in the previous connection attempt, and using that information for subsequent connection attempts. The sophistication of this cache is dependent on the memory (or disk) available, but even simple caching can be quite effective. When connecting to a new network (third generation [3G], different Wi-Fi network, or physical Ethernet), the connectivity of that new network can be determined and the cache of success or failure entirely or partially flushed, as necessary.
Thus, the doubling of connection attempts occurs only when connecting to a new network. Thereafter, initial connection attempts are delayed so that IPv6 (or IPv4) is tried first. But in all cases, significant user-noticeable delays are avoided when the IPv6 (or IPv4) is broken. The goal of Happy Eyeballs is to keep IPv6 enabled; that is, to make users unaware of IPv6 outages, so the user still visits IPv6-enabled websites without suffering any delay.
Another idea to determine if IPv6 is working is to ping or send another simple request to an IPv6 resource on the Internet, and disable IPv6 on the host if that IPv6 request fails. This approach interferes with IPv6 traffic within the enterprise (which may be working fine, whereas IPv6 to the Internet is broken), and disabling IPv6 would break IPv6 features deployed in OSs (for example, DirectAccess in Windows or Back to My Mac in Mac OS X). An advantage of this approach is that if IPv6 is disabled, no application suffers the IPv6 outage and associated delay to fall back to IPv4.
New Transport: SCTP
Besides the problem of network layer protocol selection, a similar task can be performed at the transport layer. Maybe surprisingly, one more transport protocol exists besides TCP, namely Stream Control Transmission Protocol (SCTP). SCTP provides significant advantages over TCP, and it was designed with some of the lessons learned by TCP implementations and deployment  in mind.
Unlike IPv6 and IPv4, which have different DNS resource records (AAAA and A), we don't have a resource record to indicate that an application could, or should, use a different transport protocol. But even if we could indicate support for SCTP in DNS, the path might block it, reducing the usefulness of a DNS resource record. The path could be blocked by a NAT or firewall that expects only TCP or User Datagram Protocol (UDP).
Happy Eyeballs also describes a technique where a client can simultaneously try connecting using both TCP and SCTP. By necessity, this attempt is done entirely in the application, and the application would prefer the transport that responded faster and cache that information to reduce network chatter for subsequent connections to that server. This scenario is shown in Figure 3.
By combining the IPv6/IPv4 technique with the SCTP/TCP technique, a web browser running on a computer connected to a new dual-stack network sends four packets—an IPv4 TCP SYN, an IPv6 TCP SYN, an IPv4 SCTP INIT, and an IPv6 SCTP INIT. Based on the responses, it decides which transport protocol and which address family (IPv6 or IPv4) it prefers, and abandons the other connections. As described previously, connection information is cached for subsequent use to avoid consuming network bandwidth and server resources for subsequent network connections.
New technology aimed at improving user experience will be successful only if it meets expectations—an improved user experience. Because many companies are deriving all of their revenue from the Internet, any reduction in service means a loss of revenue. Thus, deploying new technology must not negatively affect the user experience. This article described one of the mechanisms that implementers can use to avoid negative effects on the user experience.